Authors
Affiliation

Alessandro Pizzi

University of Lausanne

Andrea Lovato

Ayman El Abed

Illia Dorofieiev

Published

December 10, 2024

Abstract

Obesity has become a global health crisis, contributing to rising rates of non-communicable diseases and placing significant strain on healthcare systems worldwide. In this study, we explore the factors influencing obesity in Mexico, Peru, and Colombia through an analysis of a mixed dataset consisting of 77% synthetically generated data and 23% real-world data from 498 participants. Using data cleaning, visualization, and advanced modeling techniques, we identify key lifestyle and behavioral contributors to obesity, including dietary habits, physical activity, and demographic factors. The study employs linear regression to predict BMI and logistic regression to classify obesity, providing valuable insights into the relationship between these factors and obesity levels. While the findings are limited by the use of synthetic data and a non-representative sample, they underscore the importance of data-driven approaches in addressing public health challenges. This report aims to demonstrate the application of theoretical knowledge in a simulated environment and lay the groundwork for future studies targeting obesity reduction strategies.

1 Introduction

1.1 Project Goals

Obesity has emerged as one of the most pressing global health crises, with its prevalence nearly tripling worldwide since 1975, according to the World Health Organization (WHO). This alarming trend has fueled a dramatic rise in obesity-related diseases, including diabetes, cardiovascular conditions, and hypertension, imposing significant burdens on healthcare systems and economies. In Latin America and the Caribbean, the situation is particularly concerning: as of 2022, the Pan American Health Organization (PAHO) reported that nearly 25% of adults in the region are affected by obesity, emphasizing the urgent need for effective public health interventions. The crisis is especially acute in the countries central to this research. In 2018, Mexico recorded an adult obesity rate of 36.1%, while Peru and Colombia reported similarly worrisome rates of approximately 28% and 23%, respectively.

This widespread prevalence underscores the critical need for research focused on understanding and addressing the multifaceted factors contributing to obesity. In this context, the present study adopts an exploratory and primarily educational approach to examine the relationships between dietary habits, physical activity, and demographic variables, aiming to uncover their impact on obesity levels in Mexico, Peru, and Colombia. By leveraging a dataset consisting of 77% synthetically generated data (produced via the SMOTE algorithm) and 23% user-collected data from 498 participants, the research seeks to provide meaningful insights into this complex issue.

While the reliance on synthetic data and a non-representative sample limits direct real-world applicability, this study offers a unique opportunity to apply theoretical knowledge gained during the “Data Science in Business Analytics” course to a simulated scenario. By identifying patterns, correlations, and potential predictors of obesity, the research highlights the importance of data-driven approaches in addressing significant public health challenges. Ultimately, the findings aim to lay the groundwork for future studies and contribute to the development of informed public health strategies and healthcare policies, demonstrating the transformative potential of data analytics in managing and mitigating complex issues.

1.2 Research Questions

  • Question 1

    What are the key lifestyle and behavioral factors that significantly contribute to obesity in Mexico, Peru, and Colombia?

  • Question 2

    Can we predict whether a person will be obese based on some given combinations of factors?

  • Question 3

    How can these insights be effectively leveraged to inform public health initiatives and combat the escalating health crisis?

2 Data

2.1 Sources

The dataset utilized in this project was obtained from the UCI Machine Learning Repository, a reputable and extensively used platform for data science and machine learning projects. Originally compiled by researchers at the Universidad de la Costa, Colombia, the dataset combines 77% synthetically generated data with 23% real-world data collected through a structured online survey. The synthetic data, created using the Synthetic Minority Over-sampling Technique (SMOTE) in Weka, addresses class imbalance, enhancing the dataset’s suitability for machine learning tasks. The real-world data, gathered from 498 participants over a 30-day period, captures detailed self-reported information on dietary habits, physical activity levels, and demographic characteristics. While synthetic data introduces uniformity and balance, it inherently lacks the complexity of real-world variability, and the user-collected data, though authentic, is susceptible to self-reporting biases and sampling limitations. These characteristics, along with the dataset’s diverse origins, make it an invaluable resource for simulating real-world challenges in healthcare analytics.

2.2 Description

The dataset consists of 2111 records and 17 attributes, offering a detailed examination of the factors contributing to obesity. The attributes represent a mix of categorical and continuous variables, providing insights into demographic, lifestyle, and behavioral factors. In greater detail, an interactive table was designed to provide a comprehensive summary of the dataset’s variables.

Code
library(here)
library(knitr)
# Main features of the dataset
dataset_raw <- read.csv(here("data/raw/dataset_raw.csv"))
head(dataset_raw) %>%
  kbl(format = "html", caption = "First 6 Rows of the Raw Dataset") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f0f0f0") %>%
  scroll_box(width = "100%", height = "400px")
First 6 Rows of the Raw Dataset
Gender Age Height Weight family_history_with_overweight FAVC FCVC NCP CAEC SMOKE CH2O SCC FAF TUE CALC MTRANS NObeyesdad
Female 21 1.62 64.0 yes no 2 3 Sometimes no 2 no 0 1 no Public_Transportation Normal_Weight
Female 21 1.52 56.0 yes no 3 3 Sometimes yes 3 yes 3 0 Sometimes Public_Transportation Normal_Weight
Male 23 1.80 77.0 yes no 2 3 Sometimes no 2 no 2 1 Frequently Public_Transportation Normal_Weight
Male 27 1.80 87.0 no no 3 3 Sometimes no 2 no 2 0 Frequently Walking Overweight_Level_I
Male 22 1.78 89.8 no no 2 1 Sometimes no 2 no 0 0 Sometimes Public_Transportation Overweight_Level_II
Male 29 1.62 53.0 no yes 2 3 Sometimes no 2 no 0 0 Sometimes Automobile Normal_Weight
Code
val_meaning <- c("Indicates the gender of the individual (Male/Female).", "Represents the age of participants in years.", "The height of individuals in meters.", "The weight of participants in kilograms.", "Indicates whether a family member has suffered from overweight (Yes/No).", "Indicates if participants frequently consume high-caloric foods (Yes/No).", "Scaled from 1 to 3, reflects how often vegetables are consumed (1 = Never, 3 = Always).", "Indicates the typical number of main meals consumed daily.", "Describes how often participants eat between meals (e.g., No, Sometimes, Frequently, Always).", "Indicates whether participants smoke (Yes/No).", "Scaled from 1 to 3, reflecting daily water intake (1 = Less than 1 liter, 3 = More than 2 liters).", "Whether participants monitor their calorie intake (Yes/No).", "Scaled from 0 to 4, indicating days of physical activity per week (0 = None, 4 = 4-5 days).", "Reflects daily time spent on technological devices, in hours.", "Indicates the frequency of alcohol consumption (e.g., I don't drink, Sometimes, Frequently, Always).", "Describes the primary mode of transportation (e.g., Walking, Public Transportation, Automobile).", "The target variable, classifying obesity levels into categories such as Normal Weight, Overweight (Levels I and II), and Obesity (Types I, II, III).")
desc_table <- tibble::tibble(Name = colnames(dataset_raw), Type = sapply(dataset_raw, class), Meaning = val_meaning)
desc_table %>%
  kbl(format = "html", caption = "Variable Descriptions") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f0f0f0") %>%
  column_spec(1, bold = TRUE, width = "200px") %>%
  column_spec(2, width = "150px") %>%
  column_spec(3, width = "500px") %>%
  scroll_box(width = "100%", height = "500px")
Variable Descriptions
Name Type Meaning
Gender character Indicates the gender of the individual (Male/Female).
Age numeric Represents the age of participants in years.
Height numeric The height of individuals in meters.
Weight numeric The weight of participants in kilograms.
family_history_with_overweight character Indicates whether a family member has suffered from overweight (Yes/No).
FAVC character Indicates if participants frequently consume high-caloric foods (Yes/No).
FCVC numeric Scaled from 1 to 3, reflects how often vegetables are consumed (1 = Never, 3 = Always).
NCP numeric Indicates the typical number of main meals consumed daily.
CAEC character Describes how often participants eat between meals (e.g., No, Sometimes, Frequently, Always).
SMOKE character Indicates whether participants smoke (Yes/No).
CH2O numeric Scaled from 1 to 3, reflecting daily water intake (1 = Less than 1 liter, 3 = More than 2 liters).
SCC character Whether participants monitor their calorie intake (Yes/No).
FAF numeric Scaled from 0 to 4, indicating days of physical activity per week (0 = None, 4 = 4-5 days).
TUE numeric Reflects daily time spent on technological devices, in hours.
CALC character Indicates the frequency of alcohol consumption (e.g., I don't drink, Sometimes, Frequently, Always).
MTRANS character Describes the primary mode of transportation (e.g., Walking, Public Transportation, Automobile).
NObeyesdad character The target variable, classifying obesity levels into categories such as Normal Weight, Overweight (Levels I and II), and Obesity (Types I, II, III).

The dataset underwent a thorough preprocessing phase, including normalization of continuous variables, encoding of categorical data, and removal of missing or atypical entries to ensure high-quality analysis. Class imbalance was addressed using the SMOTE (Synthetic Minority Oversampling Technique), generating synthetic data while carefully avoiding noise or artificial patterns. The final dataset comprises 77% synthetic data, which enhances balance and diversity, and 23% real-world data, adding authenticity. This combination allows for a comprehensive analysis of obesity-related factors, while recognizing potential biases, such as inaccuracies in self-reported information.

2.3 Wrangling

Essential libraries for data manipulation, visualization, and clustering are loaded to begin the wrangling process and support subsequent analysis. Each package is utilized for its specific functionality, facilitating efficient and streamlined analysis:

  • dplyr: for data manipulation (e.g., filtering, summarizing);

  • tidyr: for data tidying (e.g., reshaping);

  • ggplot2: for visualization;

  • corrplot: for correlation matrix visualization;

  • ggridges: for creating ridge plots;

  • cluster: for clustering algorithms;

  • reshape2: for data reshaping, especially during visualization.

Code
library(dplyr)
library(tidyr)
library(ggplot2)
library(corrplot)
library(ggridges)
library(cluster)
library(reshape2)

Column names are renamed to enhance clarity and improve usability during the analysis. The updated names are designed to be shorter and more intuitive, ensuring ease of reference while retaining their original meaning and context. This adjustment simplifies code readability and helps streamline data manipulation tasks, particularly in complex analytical workflows.

Code
  dataset <- dataset_raw %>%
  rename(
    family_hist = family_history_with_overweight,
    obesity_lev = NObeyesdad,
    caloric_food = FAVC,
    vegetable_food = FCVC,
    nb_meal_day = NCP,
    food_btw_meals = CAEC,
    ch2o = CH2O,
    smoke = SMOKE,
    calorie_check = SCC,
    physical_act = FAF,
    freq_alcohol = CALC,
    use_tech = TUE,
    m_trans = MTRANS,
    gender = Gender,
    age = Age,
    weight = Weight,
    height = Height
  )

The structure of the dataset is examined to identify the data types of each variable, providing critical insights for subsequent data preparation. Understanding the data types helps pinpoint columns requiring transformations, such as converting categorical variables to factors or standardizing numeric variables for analysis.

Code
str_output <- capture.output(str(dataset))
str_table <- data.frame(Structure = str_output, stringsAsFactors = FALSE)
str_table %>%
  kbl(format = "html", caption = "Original structure of the Dataset") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f0f0f0") %>%
  scroll_box(width = "100%", height = "400px")
Original structure of the Dataset
Structure
'data.frame': 2111 obs. of 17 variables:
$ gender : chr "Female" "Female" "Male" "Male" ...
$ age : num 21 21 23 27 22 29 23 22 24 22 ...
$ height : num 1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
$ weight : num 64 56 77 87 89.8 53 55 53 64 68 ...
$ family_hist : chr "yes" "yes" "yes" "no" ...
$ caloric_food : chr "no" "no" "no" "no" ...
$ vegetable_food: num 2 3 2 3 2 2 3 2 3 2 ...
$ nb_meal_day : num 3 3 3 3 1 3 3 3 3 3 ...
$ food_btw_meals: chr "Sometimes" "Sometimes" "Sometimes" "Sometimes" ...
$ smoke : chr "no" "yes" "no" "no" ...
$ ch2o : num 2 3 2 2 2 2 2 2 2 2 ...
$ calorie_check : chr "no" "yes" "no" "no" ...
$ physical_act : num 0 3 2 2 0 0 1 3 1 1 ...
$ use_tech : num 1 0 1 0 0 0 0 0 1 1 ...
$ freq_alcohol : chr "no" "Sometimes" "Frequently" "Frequently" ...
$ m_trans : chr "Public_Transportation" "Public_Transportation" "Public_Transportation" "Walking" ...
$ obesity_lev : chr "Normal_Weight" "Normal_Weight" "Normal_Weight" "Overweight_Level_I" ...
Code
dataset <- dataset %>%
  mutate(
    gender = as.factor(gender),
    family_hist = as.factor(family_hist),
    caloric_food = as.factor(caloric_food),
    smoke = as.factor(smoke),
    calorie_check = as.factor(calorie_check),
    m_trans = as.factor(m_trans),
    obesity_lev = factor(obesity_lev, 
                         levels = c("Insufficient_Weight", "Normal_Weight", 
                                    "Overweight_Level_I", "Overweight_Level_II", 
                                    "Obesity_Type_I", "Obesity_Type_II", "Obesity_Type_III"), 
                         ordered = TRUE),
    food_btw_meals = factor(ifelse(food_btw_meals == "no", "No", food_btw_meals), 
                            levels = c("No", "Sometimes", "Frequently", "Always"), 
                            ordered = TRUE),
    freq_alcohol = factor(ifelse(freq_alcohol == "no", "No", freq_alcohol), 
                          levels = c("No", "Sometimes", "Frequently", "Always"), 
                          ordered = TRUE))


str_output <- capture.output(str(dataset))
str_table <- data.frame(Structure = str_output, stringsAsFactors = FALSE)
str_table %>%
  kbl(format = "html", caption = "Manipulated Dataset Structure") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Manipulated Dataset Structure
Structure
'data.frame': 2111 obs. of 17 variables:
$ gender : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 2 1 2 2 2 ...
$ age : num 21 21 23 27 22 29 23 22 24 22 ...
$ height : num 1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
$ weight : num 64 56 77 87 89.8 53 55 53 64 68 ...
$ family_hist : Factor w/ 2 levels "no","yes": 2 2 2 1 1 1 2 1 2 2 ...
$ caloric_food : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 2 1 2 2 ...
$ vegetable_food: num 2 3 2 3 2 2 3 2 3 2 ...
$ nb_meal_day : num 3 3 3 3 1 3 3 3 3 3 ...
$ food_btw_meals: Ord.factor w/ 4 levels "No"<"Sometimes"<..: 2 2 2 2 2 2 2 2 2 2 ...
$ smoke : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
$ ch2o : num 2 3 2 2 2 2 2 2 2 2 ...
$ calorie_check : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
$ physical_act : num 0 3 2 2 0 0 1 3 1 1 ...
$ use_tech : num 1 0 1 0 0 0 0 0 1 1 ...
$ freq_alcohol : Ord.factor w/ 4 levels "No"<"Sometimes"<..: 1 2 3 3 2 2 2 2 3 1 ...
$ m_trans : Factor w/ 5 levels "Automobile","Bike",..: 4 4 4 5 4 1 3 4 4 4 ...
$ obesity_lev : Ord.factor w/ 7 levels "Insufficient_Weight"<..: 2 2 2 3 4 2 2 2 2 2 ...

The transformations ensured the dataset was ready for analysis by restructuring categorical and ordinal variables to meet modeling requirements. Converting categorical variables into factors standardized their representation, reducing ambiguity and improving compatibility with statistical models. For ordinal variables, levels were explicitly ordered to preserve their logical progression and enhance interpretability, allowing for meaningful comparisons across categories.

The updated structure was reviewed to confirm the accuracy of these adjustments, providing confidence in the preprocessing steps. While further transformations like normalization were not applied, the focus on categorical and ordinal adjustments established a strong foundation for reliable and interpretable analysis. In particular, the levels of obesity categories, food consumption between meals, and frequency of alcohol use were arranged to reflect increasing severity or frequency, ensuring these variables captured their intended relationships and supported clear, accurate insights into the data.

Now, a numerical version of the dataset, called “dataset_num”, is created by transforming categorical variables into numerical values, ensuring compatibility with statistical analyses while maintaining logical relationships and interpretability. This numerical transformation is specifically essential for developing the correlation matrix, as it requires all variables to be in numeric format to analyze their relationships effectively.

The presence of potential missing values in the transformed dataset is checked and visualized to confirm data integrity and ensure no issues have been introduced during the conversion process.

Code
dataset_num <- dataset %>%
  mutate(obesity_lev = recode(obesity_lev,
                              "Insufficient_Weight"=1,
                              "Normal_Weight" = 2,
                              "Overweight_Level_I" = 3,
                              "Overweight_Level_II" = 4,
                              "Obesity_Type_I" = 5,
                              "Obesity_Type_II" = 6,
                              "Obesity_Type_III" = 7,
  ))

dataset_num <- dataset %>%
  mutate(freq_alcohol = recode(freq_alcohol,
                               "No"=1,        
                               "Sometimes"=2, 
                               "Frequently" =3,
                               "Always"  =4 
  ))

dataset_num <- dataset %>%
  mutate(m_trans = recode(m_trans,
                          "Automobile"=1,
                          "Bike"=2,
                          "Motorbike"=3,
                          "Public_Transportation"=4,
                          "Walking"=5,
  ))

dataset_num <- dataset %>%
  mutate(food_btw_meals = recode(food_btw_meals,
                                 "No"=0,
                                 "Sometimes"=1 ,
                                 "Frequently"=2,
                                 "Always"=3
  )
  )

dataset_num <- dataset_num%>%
  mutate(calorie_check = recode(calorie_check,
                                "no"=0,
                                "yes"=1 ,
  ))

dataset_num <- dataset_num %>%
  mutate(across(where(is.factor), ~ as.numeric(.)))


str_output <- capture.output(str(dataset_num))
table_num_str <- data.frame(Structure = str_output, stringsAsFactors = FALSE)

table_num_str %>%
  kbl(format = "html", caption = "Structure of the Numerical Dataset") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Structure of the Numerical Dataset
Structure
'data.frame': 2111 obs. of 17 variables:
$ gender : num 1 1 2 2 2 2 1 2 2 2 ...
$ age : num 21 21 23 27 22 29 23 22 24 22 ...
$ height : num 1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
$ weight : num 64 56 77 87 89.8 53 55 53 64 68 ...
$ family_hist : num 2 2 2 1 1 1 2 1 2 2 ...
$ caloric_food : num 1 1 1 1 1 2 2 1 2 2 ...
$ vegetable_food: num 2 3 2 3 2 2 3 2 3 2 ...
$ nb_meal_day : num 3 3 3 3 1 3 3 3 3 3 ...
$ food_btw_meals: num 1 1 1 1 1 1 1 1 1 1 ...
$ smoke : num 1 2 1 1 1 1 1 1 1 1 ...
$ ch2o : num 2 3 2 2 2 2 2 2 2 2 ...
$ calorie_check : num 0 1 0 0 0 0 0 0 0 0 ...
$ physical_act : num 0 3 2 2 0 0 1 3 1 1 ...
$ use_tech : num 1 0 1 0 0 0 0 0 1 1 ...
$ freq_alcohol : num 1 2 3 3 2 2 2 2 3 1 ...
$ m_trans : num 4 4 4 5 4 1 3 4 4 4 ...
$ obesity_lev : num 2 2 2 3 4 2 2 2 2 2 ...
Code
nb_na<- colSums(is.na(dataset_num))
nb_na %>%
  kbl(format = "html", caption = "Presence of Potential NA Values in the Dataset") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"), 
    full_width = FALSE, 
    position = "left"
  ) %>%
  column_spec(1, width = "100px") %>%
  column_spec(2, width = "80px") %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Presence of Potential NA Values in the Dataset
x
gender 0
age 0
height 0
weight 0
family_hist 0
caloric_food 0
vegetable_food 0
nb_meal_day 0
food_btw_meals 0
smoke 0
ch2o 0
calorie_check 0
physical_act 0
use_tech 0
freq_alcohol 0
m_trans 0
obesity_lev 0

The test results confirmed the absence of any NA values in the dataset, indicating that all variables were successfully converted to numeric format without compromising data integrity.

2.4 Spotting Mistakes and Missing Data

Check for missing values

To ensure data integrity, missing values in the dataset are examined by counting “NA” values in each column, providing a clear view of dataset completeness. The results are presented in a formatted table for easy interpretation, with styling applied for readability and a scrollable box to handle larger datasets. This process facilitates prompt handling of missing data through appropriate strategies.

Code
missing_values <- colSums(is.na(dataset))
missing_values %>%
  kbl(format = "html", caption = "Missing Values in Each Column") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width=FALSE, position = "center") %>%
  column_spec(1, width = "100px") %>%
  column_spec(2, width = "80px") %>%
  row_spec(0, bold = TRUE, background = "#f0f0f0") %>%
  scroll_box(width = "100%", height = "400px")
Missing Values in Each Column
x
gender 0
age 0
height 0
weight 0
family_hist 0
caloric_food 0
vegetable_food 0
nb_meal_day 0
food_btw_meals 0
smoke 0
ch2o 0
calorie_check 0
physical_act 0
use_tech 0
freq_alcohol 0
m_trans 0
obesity_lev 0

The analysis confirms that all columns contain complete data, with no missing values identified. This completeness ensures a robust foundation for subsequent analysis, eliminating the need for immediate data cleaning related to missing entries.

Check for duplicates

The dataset is examined for duplicated rows to ensure data integrity and eliminate redundancy. Identifying and addressing duplicates is a crucial step in data preprocessing, as redundant entries can skew analysis results and lead to misleading conclusions. This process involves systematically scanning the dataset for identical rows and quantifying their occurrence.

Code
duplicated_rows <- sum(duplicated(dataset))
duplicated_rows
[1] 24

The detection of 24 duplicated rows in the dataset highlights the need for further preprocessing to ensure data integrity, as these redundant entries could skew analysis if not properly handled.

Code
dataset <- dataset %>%
distinct()

nrow(dataset)
[1] 2087
Code
any(duplicated(dataset))
[1] FALSE

The dataset was refined by removing duplicate entries to ensure that only unique rows are retained. A verification step confirmed that no duplicates remain, ensuring the dataset’s integrity and reliability for further analysis.

2.5 Listing Anomalies and Outliers

A bar chart was created to visualize the distribution of obesity levels, providing a clear overview of class frequencies within the dataset. Particular attention is given to obesity levels, as this variable serves as the dependent variable in the predictive model to be developed later.

Code
g1 <- ggplot(dataset, aes(x = obesity_lev)) +
  geom_bar(fill = "skyblue", color = "black") +
  theme_minimal() +
  labs(
    title = "Class Distribution of Obesity Levels",
    x = "Obesity Level",
    y = "Count"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) #Adjusted the text for clarity

plotly_plot <- ggplotly(g1)
plotly_plot

The chart highlights a balanced distribution across all obesity levels, demonstrating the effectiveness of SMOTE in addressing class imbalance. By equalizing the representation of each category, the dataset becomes more reliable for analysis, reducing biases and ensuring a fair evaluation of patterns within the data. On the other hand, the synthetic data introduced by SMOTE may not fully reflect real-world variability, potentially leading to artificial patterns that could affect the interpretability of results.

A density plot was generated to visualize the age distribution across different obesity levels, providing insights into patterns and trends within the data.

Code
g2 <- ggplot(dataset, aes(x = age, fill = obesity_lev)) +
  geom_density(alpha = 0.5) +
  theme_minimal() +
    labs(
    title = "Age Distribution by Obesity Levels",
    x = "Age",
    y = "Density",
    fill = "Obesity Level")
plotly_plot1 <- ggplotly(g2)
plotly_plot1

This graph provides a detailed view of the age distribution across obesity levels and offers insight into the impact of the SMOTE algorithm in balancing the dataset. The distributions show distinct separation among obesity categories, with younger ages predominantly associated with lower obesity levels (e.g., Insufficient Weight and Normal Weight), while older ages are more prevalent in higher obesity categories (e.g., Obesity Type II and III).

Notably, sharp peaks in the density curves, such as the one around age 30 in “Obesity Type I,” could indicate potential artifacts introduced during the synthetic data generation process. While these patterns align with logical demographic trends, they highlight the need for further validation to ensure that such separations and peaks represent realistic population characteristics rather than biases from data augmentation. Overall, the dataset reflects clear and interpretable patterns, but these observations suggest the importance of cautious interpretation and robust validation in subsequent analyses.

Summary statistics were computed for key variables across obesity levels to identify potential anomalies or patterns, providing a clearer understanding of how age, height, and weight vary within each category.

Code
dataset_stat <- dataset %>%
  group_by(obesity_lev) %>%
  summarize(
    Age_Mean = mean(age, na.rm = TRUE),
    Age_SD = sd(age, na.rm = TRUE),
    Height_Mean = mean(height, na.rm = TRUE),
    Height_SD = sd(height, na.rm = TRUE),
    Weight_Mean = mean(weight, na.rm = TRUE),
    Weight_SD = sd(weight, na.rm = TRUE)
  )
dataset_stat %>%
  kbl(format = "html", caption = "Summary Statistics by Obesity Level", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Summary Statistics by Obesity Level
obesity_lev Age_Mean Age_SD Height_Mean Height_SD Weight_Mean Weight_SD
Insufficient_Weight 19.8 2.7 1.7 0.1 50.0 6.0
Normal_Weight 21.8 5.1 1.7 0.1 62.2 9.3
Overweight_Level_I 23.5 6.3 1.7 0.1 74.5 8.6
Overweight_Level_II 27.0 8.1 1.7 0.1 82.1 8.5
Obesity_Type_I 25.9 7.8 1.7 0.1 92.9 11.5
Obesity_Type_II 28.2 4.9 1.8 0.1 115.3 8.0
Obesity_Type_III 23.5 2.8 1.7 0.1 120.9 15.5

The summary statistics reveal distinct differences across obesity levels. As expected, weight increases progressively with higher obesity categories, accompanied by slightly larger variations in standard deviation. Interestingly, height remains relatively constant across categories, suggesting it plays a limited role in distinguishing obesity levels. The age distribution shows a notable shift, with younger individuals dominating the lower obesity levels and a broader age range in higher levels, highlighting potential demographic patterns worth further exploration. These insights confirm the logical trends in the dataset, providing confidence in its structure while emphasizing the need for further analysis of these relationships.

Clustering was performed using k-means to explore the dataset’s structure and assess the coherence of the groups, with the silhouette score calculated to evaluate the quality and separation of the clusters.

Code
library(cluster)
set.seed(123)
kmeans_res <- kmeans(select(dataset, where(is.numeric)), centers = length(unique(dataset$obesity_lev)))
silhouette_score <- silhouette(kmeans_res$cluster, dist(select(dataset, where(is.numeric))))
mean_silhouette_score <- mean(silhouette_score[, "sil_width"])
mean_silhouette_score
[1] 0.4513519

The mean silhouette score of approximately 0.456 indicates moderate cohesion within clusters and reasonable separation between them. This suggests that the clusters, representing different obesity levels, are distinguishable but not excessively isolated. The result reflects a balance between natural class separability and the effects of data augmentation with SMOTE, which appears to have effectively balanced the dataset without introducing significant distortions. These findings provide confidence in the dataset’s suitability for clustering-based exploration while highlighting the importance of further validation to ensure the robustness of the observed patterns.

2.6 Correlation Analysis

To explore relationships among variables and their association with obesity levels, a correlation matrix was computed. The analysis focuses on identifying the strength and direction of correlations between “obesity_lev” (the dependent variable) and other predictors, such as physical activity, frequency of alcohol consumption, and dietary habits. By ordering variables based on their correlation with “obesity_lev”, the matrix highlights the most influential factors in determining obesity levels. A heatmap visualization was then created to provide an intuitive representation of these relationships, with a gradient color scale indicating the strength of positive and negative correlations. This approach facilitates the identification of key variables for further analysis and modeling.

Code
#Assuming dataset_num is already defined and contains the relevant columns
cor_matrix <- cor(dataset_num %>% select("physical_act", "freq_alcohol", "obesity_lev", "age", "weight","height", "family_hist", "caloric_food", "vegetable_food", "food_btw_meals", "use_tech", "ch2o", "m_trans", "smoke","nb_meal_day", "calorie_check", "gender"),use = "complete.obs")

#Extract the correlations with 'obesity_lev'
cor_with_obesity_lev <- cor_matrix["obesity_lev",]

#Order variables by their correlation with 'obesity_lev'
ordered_vars <- names(sort(cor_with_obesity_lev, decreasing = TRUE))

#Reorder the correlation matrix based on this order
cor_matrix_ordered <- cor_matrix[ordered_vars, ordered_vars]

#Melt the ordered correlation matrix into long format
cor_long <- melt(cor_matrix_ordered)

g3 <- ggplot(cor_long, aes(x = Var1, y = Var2, fill = value)) + 
    geom_tile() + 
    geom_text(aes(label = round(value, 2)), color = "black", size = 2.5, vjust = 0.5, hjust = 0.5) +
    scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
    labs(title = "Correlation Heatmap Ordered by Obesity Level", x = "Variables", y
       = "Variables") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        axis.text.y = element_text(angle = 45, vjust = 1))

plot3 <- ggplotly(g3)
plot3
Code
# Create the heatmap with correlation values

# Assuming dataset_num is already defined and contains the relevant columns
cor_matrix <- cor(dataset_num %>% select("physical_act", "freq_alcohol", "obesity_lev", "age", "weight", "family_hist", "caloric_food", "vegetable_food", "food_btw_meals", "use_tech","ch2o", "height", "calorie_check", "gender"), use = "complete.obs")

# Extract the correlations with "obesity_lev"
cor_with_obesity_lev <- cor_matrix["obesity_lev",]

# Order variables by their correlation with 'obesity_lev'
ordered_vars <- names(sort(cor_with_obesity_lev, decreasing = TRUE))

# Reorder the correlation matrix based on this order
cor_matrix_ordered <- cor_matrix[ordered_vars, ordered_vars]

# Melt the ordered correlation matrix into long format
cor_long <- melt(cor_matrix_ordered)

g4 <- ggplot(cor_long, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = round(value, 2)), color = "black", size = 2.5, vjust = 0.5
            , hjust = 0.5) + # Center text within tiles
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  labs(title = "Correlation Heatmap Ordered by Obesity Level", x = "Variables", y
       = "Variables") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        axis.text.y = element_text(angle = 45, vjust = 1) 
  )
plot4 <- ggplotly(g4)
plot4

The correlation matrices provide valuable insights into the relationships between variables and their association with obesity levels. As expected, weight exhibits a very strong positive correlation with obesity level, reinforcing its central role in defining the target variable. Family history of obesity and caloric food consumption also show moderate positive correlations, highlighting their relevance as predictive factors.

Conversely, variables such as physical activity and food consumption between meals exhibit weak or negative correlations, suggesting that their influence on obesity levels is less pronounced. These patterns align with logical trends but also underscore the need for careful consideration of multicollinearity and the relative importance of variables in predictive modeling. The heatmap’s clear organization of variables by their correlation strength aids in identifying the most impactful factors for further analysis. Overall, the results confirm that the dataset’s structure supports a robust examination of the factors influencing obesity.

3 Exploratory Data Analysis (EDA)

The Exploratory Data Analysis (EDA) phase of the project was designed to uncover meaningful patterns and insights while ensuring the dataset was optimized for analysis. A correlation heatmap was employed early in the process to identify and assess relationships between variables. By comparing the initial and refined versions of the heatmap, we effectively filtered out less relevant variables, allowing the analysis to focus on the most impactful features. This step not only streamlined the dataset but also enhanced its interpretability, ensuring a more targeted exploration of key patterns.

Key variables retained after this step were selected based on their strong correlations with the target variable and their potential relevance to underlying patterns in the data.

The EDA process involved a systematic exploration of the cleaned and refined dataset, utilizing visualization tools to highlight trends, distributions, and potential anomalies.

3.0.0.1 Descriptive statistics and distribution analysis

The initial phase of the Exploratory Data Analysis (EDA) concentrated on examining the most strongly correlated variables: Age, Height, and Weight. These variables were prioritized due to their direct relevance and significant relationships with the target outcomes, as highlighted in the refined correlation heatmap.

3.0.0.1.1 Age
Code
age_summary <- summary(dataset$age)
age_sd <- sd(dataset$age, na.rm = TRUE)
sum_age_df <- tibble::tibble(
  Metric = c(names(age_summary), "Standard Deviation"),
  Value = round(c(age_summary, age_sd), 2)
)
kable(sum_age_df, format = "markdown", caption = "Age Variable Statistics")
Age Variable Statistics
Metric Value
Min. 14.00
1st Qu. 19.92
Median 22.85
Mean 24.35
3rd Qu. 26.00
Max. 61.00
Standard Deviation 6.37

The age variable exhibits a right-skewed distribution, with a mean of 24.3 years and a median of 22.78 years, indicating a slight asymmetry toward younger ages. The range spans from 14 to 61 years, though the majority of individuals fall within the 20–30 age group. A standard deviation of 6.35 years reflects moderate variability in age across the dataset. This predominantly young sample may introduce limitations when generalizing findings to older populations, where obesity-related factors might differ significantly.

Age Distribution by Obesity Level

Code
g5 <- ggplot(dataset, aes(x = obesity_lev, y = age, fill = obesity_lev)) +
  geom_violin(trim = FALSE, alpha = 0.6) +
  geom_boxplot(width = 0.1, color = "black", fill = "white") +
  labs(title = "Age Distribution by Obesity Level", x = "Obesity Level", y = "Age") +
  theme_minimal() +
   theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot5 <- ggplotly(g5)
plot5

The violin plot highlights the age distribution across obesity levels, illustrating that individuals with insufficient or normal weight are predominantly younger, with ages concentrated between 14 and 30 years. In contrast, higher obesity levels, such as Obesity Type I and Type II, display a broader age range, peaking around 30–40 years. Severe obesity (Type III) is rare among younger individuals but becomes more prevalent in mid-adulthood. This visualization emphasizes the gradual increase in obesity risk with age, underlining the importance of early intervention, particularly during early and mid-adulthood, when such risks are most pronounced.

Age Distribution with SMOOTH Trend Line for Obesity Probability

Code
g6 <- ggplot(dataset, aes(x = age, y = as.numeric(obesity_lev))) +
  geom_jitter(alpha = 0.3) +
  geom_smooth(method = "loess", se = FALSE, color = "blue") +
  labs(title = "Trend of Obesity Level with Age", x = "Age", y = "Obesity Level") +
  theme_minimal()
plot6 <- ggplotly(g6)
plot6

Complementing this, the trend line graph further captures the trajectory of obesity levels with age. A sharp rise in obesity is observed from adolescence to early adulthood, peaking in the 25–30 years range. This critical transition phase is likely influenced by lifestyle factors such as reduced physical activity, increased caloric intake, and metabolic changes. After this peak, the trend reveals a gradual decline in obesity levels beyond 30 years, potentially reflecting improved health awareness, dietary adjustments, or a selection bias in older populations. These insights underscore the mid-20s to early-30s as a pivotal stage for targeted obesity prevention and intervention strategies.

3.0.0.1.2 Height
Code
height_summary <- summary(dataset$height)
height_sd <- sd(dataset$height, na.rm = TRUE)
sum_height_df <- tibble::tibble(
  Metric = c(names(height_summary), "Standard Deviation"),
  Value = round(c(height_summary, height_sd), 2)
)
kable(sum_height_df, format = "markdown", caption = "Height Variable Statistics")
Height Variable Statistics
Metric Value
Min. 1.45
1st Qu. 1.63
Median 1.70
Mean 1.70
3rd Qu. 1.77
Max. 1.98
Standard Deviation 0.09

Height distribution

Code
g7 <- ggplot(dataset, aes(x = height)) +
  geom_histogram(bins = 20, fill = "purple", color = "black", alpha = 0.7) +
  labs(title = "Height Distribution", x = "Height (m)", y = "Count") +
  theme_minimal()
plot7 <- ggplotly(g7)
plot7

The height distribution, as shown in the histogram, follows an approximately normal shape with a slight right skew. Most values range between 1.45m and 1.98m, peaking around 1.8m, which represents the most common height. Both the mean and median are 1.7m, confirming a nearly symmetrical distribution. The standard deviation of 0.09 indicates low variability, and no extreme outliers are observed, highlighting a realistic and consistent dataset for height.

Box Plot of Height by Obesity Level

Code
g8 <- ggplot(dataset, aes(x = obesity_lev, y = height, fill = obesity_lev)) +
  geom_violin(alpha = 0.6) +
  labs(title = "Height Distribution by Obesity Level", x = "Obesity Level", y = "Height") +
  theme_minimal() +
  theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))
plot8 <- ggplotly(g8)
plot8

The violin plot further explores the height distribution across obesity levels. Each category exhibits relatively low variability, with overlapping ranges across groups. Insufficient and Normal Weight categories have slightly narrower distributions, centered around 1.7m. As obesity levels increase, from Obesity Type I to Type III, the distributions remain consistent, indicating that height does not significantly vary with obesity classification. These findings suggest that while height remains a stable feature, weight likely plays a more decisive role in determining obesity levels.

3.0.0.1.3 Weight
Code
sum_weight_df <- tibble::tibble(
  Metric = c(names(summary(dataset$weight)), "Std. Dev"),
  Value = round(c(summary(dataset$weight), sd(dataset$weight, na.rm = TRUE)), 2)
)
kable(sum_weight_df, format = "markdown", caption = "Weight Variable Statistics")
Weight Variable Statistics
Metric Value
Min. 39.00
1st Qu. 66.00
Median 83.10
Mean 86.86
3rd Qu. 108.02
Max. 173.00
Std. Dev 26.19

Density plot for weight distribution by gender

Code
g9 <- ggplot(dataset, aes(x = weight, fill = gender)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot of Weight by Gender", x = "Weight", y = "Density") +
  scale_fill_manual(values = c("pink", "lightblue"), name = "Gender", labels = c("Female", "Male")) +
  theme_minimal()
plot9 <- ggplotly(g9)
plot9

The density plot highlights distinct differences in weight distribution between genders. Females generally exhibit lower weights, with a peak around 70 units, whereas males show peaks at 85 and 115 units, reflecting a tendency toward higher weights. An overlapping region between 80 and 90 units indicates common weight ranges for both genders, though the distinct peaks underscore gender-based differences. Weight in the dataset ranges from 39 to 173 units, with an average of 86.6 units, a median of 83 units, and a standard deviation of 26.2, indicating moderate variability.

Ridgeline Plot of Weight by Obesity Level.

Code
ggplot(dataset, aes(x = weight, y = obesity_lev, fill = obesity_lev)) +
  geom_density_ridges(scale = 0.9, alpha = 0.6) +
  labs(title = "Ridgeline Plot of Weight by Obesity Level", x = "Weight", y = "Obesity Level") +
  theme_minimal() +
  theme(legend.position = "none")

Code
# can't seem to make the interractive plot work 

The ridgeline plot further illustrates the relationship between weight and obesity levels. As obesity levels increase, the weight distribution shifts consistently toward higher values. Categories such as “Insufficient Weight” and “Normal Weight” cluster at lower ranges, while higher obesity types (I, II, and III) peak at significantly greater weights. This clear progression confirms a strong positive association between weight and obesity levels, reinforcing the centrality of weight in obesity classification. The dataset’s average weight remains at 86.6 units with a standard deviation of 26.6, capturing the variability across different obesity categories.

3.0.0.1.4 Height and Weight

Researching deeply the relationship between height and weight, trends across obesity levels were examined using scatter plots, providing critical insights that reinforce theoretical expectations while contextualizing weight variations within height ranges for different obesity classifications.

Scatter Plot (height vs weight), colored by obesity level

Code
g11 <- ggplot(dataset, aes(x = height, y = weight, color = obesity_lev)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, aes(group = obesity_lev)) +  # Adds a trend line for each obesity level
  ggtitle("Scatter Plot of Weight vs Height by Obesity Level") +
  theme_minimal() +
  labs(x = "Height", y = "Weight", color = "Obesity Level")
plot11 <- ggplotly(g11)
plot11

Facet Grid for Height and Weight by Obesity Level

Code
g12 <- ggplot(dataset, aes(x = height, y = weight)) +
  geom_point(alpha = 0.7, aes(color = obesity_lev)) +
  facet_wrap(~ obesity_lev) +
  ggtitle("Facet Grid of Weight and Height by Obesity Level") +
  theme_minimal() +
  labs(x = "Height", y = "Weight", color = "Obesity Level") +
  theme(legend.position = "none")
plot12 <- ggplotly(g12)
plot12

The first scatter plot presents an overview, illustrating the general trend of increasing weight with height, stratified by obesity levels. To better isolate and visualize these individual trends, the initial graph is expanded into a facet grid, offering a clearer perspective on the separate trends within each obesity category and highlighting distinct relationships and ranges.

Correlation between height and weight

Code
correlation_height_weight <- cor(dataset$height, dataset$weight, use = "complete.obs")
correlation_height_weight
[1] 0.457468

The observed correlation between height and weight (r = 0.463) aligns with findings in existing literature, confirming a moderate positive relationship and reinforcing the expectation that taller individuals generally weigh more, though the strength of this association varies slightly across obesity levels.

With the analysis of age, height, and weight completed, attention shifts to exploring the remaining variables in the dataset. These variables, while less directly correlated with the target outcomes, offer critical insights into behavioral, lifestyle, and environmental factors that may influence obesity levels.

3.0.0.1.5 Food between meals
Code
# Dodged Bar Chart for food_btw_meals by obesity levels
g13 <- ggplot(dataset, aes(x = food_btw_meals, fill = obesity_lev)) +
   geom_bar(position = "dodge", color = "black") +
   ggtitle("Dodged Bar Chart for Food Between Meals by Obesity Levels") +
   labs(x = "Food Between Meals", y = "Count", fill = "Obesity Levels") +
   theme_minimal() +
   theme(
         plot.title = element_text(hjust = 0.5, size = 14))

plot13 <- ggplotly(g13)
plot13
Code
# Stacked Bar Chart of Food Between Meals by Obesity Level (Proportions within each Obesity Level)
g14 <- ggplot(dataset, aes(x = obesity_lev, fill = food_btw_meals)) +
    geom_bar(position = "fill") + # Stacked bar chart with proportions
    scale_y_continuous(labels = scales::percent_format(accuracy = 1)) + # Format y-axis as percentages
    ggtitle("Proportion of Food Between Meals Across Obesity Levels") + # Shortened and clear title
    labs(x = "Obesity Levels", y = "Proportion (%)", fill = "Food Between Meals") + # Correct axis and legend labels
    theme_minimal() +
    theme(
        axis.text.x = element_text(angle = 45, hjust = 1), # Rotate x-axis text for readability
        plot.title = element_text(hjust = 0.5, size = 14) # Center and style the title
    )
plot14 <- ggplotly(g14)
plot14

The charts provide a clear illustration of how the frequency of eating between meals varies across obesity levels. The most dominant behavior across all categories is “Sometimes,” which peaks in intermediate levels like Normal Weight and Overweight Level I, reflecting a common pattern of moderate snacking. However, as obesity levels increase to Obesity Types I–III, the responses for “Frequently” and “Always” diminish, while “Sometimes” becomes even more prevalent. This shift could indicate that higher obesity levels are more associated with habitual moderate snacking rather than excessive meal-snacking frequency. On the other hand, “No” responses remain negligible across all obesity levels, suggesting that eating between meals is almost universal in this population. This pattern underscores the importance of examining not just the frequency but also the quality and context of snacking as potential contributors to obesity progression.

3.0.0.1.6 High-caloric food consumption
Code
# Grouped Bar Chart of High-Caloric Food by Obesity Level (Counts)
g16 <- ggplot(dataset, aes(x = obesity_lev, fill = caloric_food)) +
  geom_bar(
    position = "dodge",
    color = "black"
  ) +
  ggtitle("Grouped Bar Chart of High-Caloric Food Consumption Across Obesity Levels") +
  labs(x = "Obesity Levels", y = "Count", fill = "High-Caloric Food Consumption") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(hjust = 0.5, size = 14)
  )
plot16 <- ggplotly(g16)
plot16

The grouped bar chart highlights a clear trend of increased high-caloric food consumption as obesity levels rise. High-caloric food consumption (“yes”) dominates across all categories, surpassing 75% of responses and becoming nearly universal in higher obesity levels (Obesity Type I–III). In contrast, “no” responses are more prominent in lower categories like Insufficient Weight and Normal Weight but remain relatively infrequent.

Code
percentage_high_caloric_consumers <- mean(dataset$caloric_food == "yes") * 100
percentage_high_caloric_consumers
[1] 88.35649

More precisely, a notable 88.4% of participants report frequent consumption of high-calorie foods, a behavior strongly associated with weight gain. This underscores the critical importance of dietary interventions aimed at reducing high-calorie intake to address obesity progression effectively.

3.0.0.1.7 Alcohol consumption

Frequence in consumption of alcohol

Code
# Filter out "Always" responses from the dataset
filtered_dataset <- dataset %>%
  filter(freq_alcohol != "Always")

# Dodged Bar Chart for freq_alcohol by Obesity Levels (excluding "Always")
g17 <- ggplot(filtered_dataset, aes(x = freq_alcohol, fill = obesity_lev)) +
   geom_bar(position = "dodge", color = "black") +
   ggtitle("Dodged Bar Chart for Alcohol Consumption by Obesity Levels") +
   labs(x = "Alcohol Consumption Frequency", y = "Count", fill = "Obesity Levels") +
   theme_minimal() +
   theme(
         plot.title = element_text(hjust = 0.5, size = 14)) # Center and style the title

plot17 <- ggplotly(g17)
plot17

Regarding alcohol consumption, the chart shows that “Sometimes” is the dominant alcohol consumption frequency across all obesity levels, particularly in Normal Weight, Overweight Level I, and II categories. As obesity increases, “Frequently” becomes slightly more prominent, especially in Obesity Type III, while “No” responses decrease, being more common in lower obesity levels such as Insufficient and Normal Weight. The “Always” responses are excluded from this chart due to their near absence in the dataset, highlighting that excessive alcohol consumption is rare. This trend underlines the potential relationship between moderate-to-frequent alcohol consumption and higher obesity levels, emphasizing its importance for obesity-related behavioral research.

Code
# Prepare the data summary for 'Sometimes' and 'No' responses
data_summary <- dataset %>%
  filter(freq_alcohol %in% c("Sometimes", "No")) %>%
  group_by(obesity_lev, freq_alcohol) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(obesity_lev) %>%
  mutate(
    total = sum(count),
    proportion = count / total
  ) %>%
  ungroup()

# Visualization with updated title
g18 <- ggplot(data_summary, aes(x = obesity_lev, y = proportion, group = freq_alcohol, color = freq_alcohol)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +  # Format y-axis as percentages
  ggtitle("Proportion of 'Sometimes' and 'No' Alcohol Responses by Obesity Level") +
  labs(x = "Obesity Level", y = "Proportion (%)", color = "Alcohol Frequency") +
  scale_color_manual(values = c("No" = "purple", "Sometimes" = "gold")) + # Improved color scheme
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(hjust = 0.5, size = 14),  # Center and style title
    legend.position = "top"
  )
plot18 <- ggplotly(g18)
plot18

To better illustrate the trends in alcohol consumption frequency across obesity levels, this graph was created to highlight the shifting proportions of individuals consuming alcohol “Sometimes” and abstaining (“No”). The proportion of individuals who drink alcohol “Sometimes” shows a steady increase with higher obesity levels, peaking in Obesity_Type_III. Conversely, the proportion of those who abstain from alcohol decreases as obesity levels rise, suggesting an inverse relationship between abstention and obesity severity.

This pattern raises questions about the potential interaction between alcohol consumption frequency and caloric food preferences, as both behaviors appear to be associated with higher obesity levels. Investigating this interaction could provide insights into whether a combination of moderate alcohol consumption and high-calorie food preferences exerts a compounded effect on obesity risk. Understanding these combined lifestyle factors could inform strategies aimed at mitigating obesity progression more effectively.

3.0.0.1.8 Daily Calorie Monitoring
Code
# Dodged Bar Chart for calorie_check by Obesity Levels
g19 <- ggplot(dataset, aes(x = calorie_check, fill = obesity_lev)) +
   geom_bar(position = "dodge", color = "black") +
   ggtitle("    Dodged Bar Chart for the check of the calories by Obesity Levels") +
   labs(x = "High-Caloric Food Consumption", y = "Count", fill = "Obesity Levels") +
   theme_minimal() +
   theme(
         plot.title = element_text(hjust = 0.5, size = 14)) # Center and style the title
plot19 <- ggplotly(g19)
plot19
Code
data_summary <- dataset %>%
  group_by(obesity_lev, calorie_check) %>%
  summarise(count = n(), .groups = "drop") %>%
  mutate(total = sum(count), proportion = count / total)

# Proportion of Calorie Checking by Obesity Level
g20 <- ggplot(data_summary, aes(x = obesity_lev, y = proportion, group = calorie_check, color = calorie_check)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent) +
  scale_color_manual(values = c("no" = "lightcoral", "yes" = "lightblue")) +
  labs(title = "Proportion of Calorie Checking by Obesity Level", x = "Obesity Level", y = "Proportion", color = "Calorie Check") +
  theme_minimal() +
  theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))
plot20 <- ggplotly(g20)
plot20

The Dodged Bar Chart highlights two main trends regarding calorie-checking behavior across obesity levels: a significant increase in “Yes” responses as obesity levels rise, particularly from Overweight Level II onward, and a decrease in “No” responses, which are more prevalent in lower obesity levels like Normal Weight and Insufficient Weight. The proportion graph simplifies these trends by clearly illustrating the proportional shift between “Yes” and “No” responses, making the contrast between lower and higher obesity levels more visually apparent. Together, these visualizations emphasize a potential association between obesity severity and an increased tendency to check calorie intake, suggesting heightened dietary awareness in higher obesity categories.

3.0.0.1.9 Vegetable consumption
Code
g21 <- ggplot(dataset, aes(x = vegetable_food)) +
  geom_histogram(aes(y =after_stat(density)), bins = 30, fill = "lightgreen", color = "black", alpha = 0.6) +
  geom_density(color = "darkgreen", linewidth = 1) +
  ggtitle("Histogram and Density of Vegetable Food Consumption") +
  theme_minimal() +
  labs(x = "Vegetable Food Consumption", y = "Density")
plot21 <- ggplotly(g21)
plot21
Code
g22 <- ggplot(dataset, aes(x = weight, y = vegetable_food, color = obesity_lev)) +
    geom_point(alpha = 0.6) +
    geom_smooth(method = "loess", se = FALSE, color = "black") +
    labs(title = "Scatterplot of Weight vs Vegetable Food Consumption", 
         x = "Weight", 
         y = "Vegetable Food Consumption") +
    theme_minimal() +
    coord_cartesian(xlim= c(40, 135), ylim= c(2, 3))
plot22 <- ggplotly(g22)
plot22

The scatterplot provided with the trend line illustrates a distinct, non-linear relationship: vegetable consumption initially decreases as weight increases but then begins to rise again at higher weight levels.

This pattern suggests that individuals with lower weight, particularly those in the Insufficient Weight and Normal Weight categories, tend to report higher vegetable consumption. As weight progresses toward the Overweight categories, vegetable consumption decreases slightly, indicating a possible reduction in healthy dietary habits. However, at the upper end of the weight spectrum, corresponding to Obesity Type II and Obesity Type III, vegetable consumption increases again, potentially due to dietary interventions or awareness in this group.

The trend reveals two possible key insights:

  • A dip in vegetable consumption occurs in intermediate weight ranges, aligning with the overweight population.
  • The sharp increase in vegetable consumption among the most obese individuals may reflect lifestyle adjustments prompted by health concerns or medical advice.
3.0.0.1.10 Physical activity
Code
g23 <- ggplot(dataset, aes(x = physical_act)) +
  geom_histogram(aes(y = ..density..), bins = 30, fill = "skyblue", color = "black", alpha = 0.6) +
  geom_density(color = "darkblue", linewidth = 1) +
  ggtitle("Histogram and Density of Physical Activity") +
  theme_minimal() +
  labs(x = "Physical Activity", y = "Density")
plot23 <- ggplotly(g23)
plot23

The histogram and density plot reveal that physical activity levels have distinct peaks at 0, 1, 2, and 3, suggesting that these values are common reported levels. Intermediate values, likely due to synthetic data or SMOTE, are also present but less frequent.

Violin plot by category

Code
g24 <- ggplot(dataset, aes(x = obesity_lev, y = physical_act, fill = obesity_lev)) +  
  geom_violin(trim = FALSE, alpha = 0.6) +
  geom_boxplot(width = 0.1, color = "black", fill = "white") +
    ggtitle("Violin Plot of Physical Activity by Obesity Level") +
  labs(x = "Obesity Level", y = "Physical Activity") +
  theme_minimal() +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot24 <- ggplotly(g24)
plot24

Physical activity levels show a slight decline as obesity levels increase, particularly evident in the narrowing distributions and lower medians observed for Obesity Type II and Obesity Type III categories. In contrast, the Insufficient Weight and Normal Weight groups exhibit higher physical activity levels, as reflected by their broader and more symmetrical distributions.

The graph reveals a distinct trend: individuals in lower obesity categories engage in more physical activity compared to those in higher obesity categories. This trend suggests an inverse relationship between physical activity and obesity levels.

3.0.0.1.11 Water consumption
Code
g25 <- ggplot(dataset, aes(x = ch2o)) +
  geom_histogram(aes(y = ..density..), bins = 30, fill = "skyblue", color = "black", alpha = 0.6) +
  geom_density(color = "darkblue", size = 1) +
  ggtitle("Histogram and Density of Comsumption of Water") +
  theme_minimal() +
  labs(x = "CH2O", y = "Density")
plot25 <- ggplotly(g25)
plot25

This histogram and density plot of daily water consumption (CH2O) shows a clear peak at 2 liters, indicating that most individuals consume around this amount. This aligns with scientific literature, which generally recommends an average daily water intake of about 2 liters for optimal health.

Trend Line of Weight vs Water Consumption

Code
# Scatterplot with a LOESS trend line
g26 <- ggplot(dataset, aes(x = weight, y = ch2o, color = obesity_lev)) +
    geom_point(alpha = 0.6) +
    geom_smooth(method = "loess", se = FALSE, color = "black") +
    labs(title = "Scatterplot of Weight vs Water Consumption", x = "Weight", y = "Water Consumption (ch2o)") +
    theme_minimal() +
coord_cartesian(xlim= c(35, 135))
plot26 <- ggplotly(g26)
plot26

The scatterplot visualizes the relationship between weight and water consumption (ch2o), categorized by obesity levels. The trend line reveals a slightly increasing pattern of water consumption as weight increases, though the relationship is relatively weak and mostly linear.

This pattern suggests that individuals with Insufficient Weight and Normal Weight categories generally report slightly lower water consumption compared to individuals in the higher weight categories, such as Obesity Type II and III. The increase in water consumption among higher weight groups could indicate attempts to adopt healthier habits or increased hydration needs due to larger body sizes. However, the relatively flat trend across most weight ranges suggests that water consumption does not vary dramatically across different weight categories, highlighting a potential area for targeted interventions to promote hydration as a component of healthy dietary behavior.

3.0.0.1.12 Technology utilization

Density of Use of Technology by Obesity Level

Code
g28 <- ggplot(dataset, aes(x = use_tech, fill = obesity_lev)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density of Use of Technology by Obesity Level", x = "Use of Technology", y = "Density") +
  theme_minimal()
plot28 <- ggplotly(g28)
plot28

This density plot provides a perspective on the use of technology across different obesity levels. A striking feature is the sharp, dominant peak in Obesity Type III (yellow) around the value of 1. This pattern diverges notably from the smoother and more evenly distributed curves seen in other obesity categories, suggesting a unique behavioral trend in this group.

The peak indicates a strong clustering of individuals in Obesity Type III who report moderate use of technology, which may reflect consistent engagement with technology-based activities such as sedentary work, entertainment, or even health-monitoring applications. In contrast, other obesity categories, such as Obesity Type II and Overweight Level II, exhibit more balanced distributions without a single dominant peak, hinting at more varied technology usage patterns.

This observation raises interesting questions about the role of technology in shaping lifestyle behaviors in Obesity Type III individuals. It may point to a reliance on technology that correlates with a sedentary lifestyle, a known risk factor for obesity. Alternatively, it could reflect targeted interventions or habits specific to this group.

The Exploratory Data Analysis (EDA) phase provided a comprehensive understanding of the dataset, offering key insights into the relationships between various behavioral, lifestyle, and demographic factors and obesity levels. By focusing on critical variables the EDA revealed patterns and trends that are integral to the modeling process.

4 Analysis

The analysis phase is dedicated to the development, refinement, and comprehensive evaluation of the predictive models, meticulously designed to directly address the previously defined research questions.

4.1 Methods

The modeling process is structured to address the two key research questions:

  1. identifying the most significant lifestyle and behavioral factors contributing to obesity in Mexico, Peru, and Colombia;

  2. predicting whether a person will be obese based on some given combinations of factors.

4.1.1 Logistic Regression Model

To accurately address the key research questions, a logistic regression model will be employed to estimate the probability of individuals belonging to a categorie: obese or not obese. Weight and height will be excluded as predictors in the model because they are directly used to calculate BMI, which serves as the basis for the obesity levels categorized in the dataset. Including these variables would create a dependency between the predictors and the target variable, potentially biasing the analysis. By excluding weight and height, the focus shifts to behavioral and lifestyle factors, such as dietary habits, physical activity, and demographic characteristics, to better understand their influence on obesity risk.

While logistic regression provides a clear and interpretable framework for estimating probabilities, it inherently limits the analysis to a binary classification. This restriction prevents the exploration of the full spectrum of obesity levels, such as Obesity Type I, II, or III, as classified in the dataset. Despite this limitation, logistic regression is a robust method for quantifying the relationships between independent variables and the binary outcome. Feature selection techniques will ensure that only the most relevant predictors are retained, and the model’s performance will be rigorously evaluated using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC, ensuring reliable and actionable insights.

4.1.2 Insights and Limitations

Regression analysis helps us understand how predictors influence outcomes, with logistic regression classifying individuals as obese or not obese. As already discussed in the previous sections, the dataset offers a mix of advantages and challenges: synthetic data ensures balanced representation but lacks the complexity of real-world patterns, while user-collected data adds variability but is prone to biases. Logistic regression simplifies the analysis by focusing on binary outcomes, leaving out the nuanced gradations of obesity, and assumes linearity, which may not fully capture complex relationships. Despite these limitations, the model offers insights into obesity risk, serving as a valuable exercise and foundation for future data explorations, even if not directly applicable to real-world scenarios.

4.2 Objectives of the Selected Method

4.2.1 Logistic Regression Model Development

Data Loading and Processing

To align with the requirements of a logistic regression model, it was necessary to modify the dataset’s target variable. The original variable, obesity level, was a multi-class categorical variable representing varying degrees of obesity and non-obesity. Since logistic regression is designed for binary classification, the target variable was converted into a binary format. Individuals with a BMI ≥ 30 were classified as obese (1), while others were classified as non-obese (0). This transformation ensured compatibility with the logistic regression framework. Following this adjustment, the target variable was converted into a factor, and the dataset was reviewed for consistency and readiness for analysis.

Code
dataset <- read.csv(here("data/processed/dataset.csv"))
dataset$BMI <- dataset$weight / (dataset$height^2)
head(dataset$BMI) %>%
  tibble::enframe(name = "Row", value = "BMI") %>%
  kable(format = "html", caption = "First 6 BMI Values") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5")
First 6 BMI Values
Row BMI
1 24.38653
2 24.23823
3 23.76543
4 26.85185
5 28.34238
6 20.19509
Code
summary_BMI <- summary(dataset$BMI)
summary_BMI %>%
  tibble::enframe(name = "Statistic", value = "Value") %>%
  kable(format = "html", caption = "Summary Statistics for BMI") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5")
Summary Statistics for BMI
Statistic Value
Min. 13.00
1st Qu. 24.37
Median 28.90
Mean 29.77
3rd Qu. 36.10
Max. 50.81
Code
dataset$Obesity <- ifelse(dataset$BMI >= 30, 1, 0)
dataset$Obesity <- as.factor(dataset$Obesity)
table(dataset$Obesity) %>%
  as.data.frame() %>%
  kable(format = "html", caption = "Frequency: obesity categories") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5")
Frequency: obesity categories
Var1 Freq
0 1113
1 974
Code
head(dataset) %>%
  kable(format = "html", caption = "First 6 Rows of Updated Dataset") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
First 6 Rows of Updated Dataset
gender age height weight family_hist caloric_food vegetable_food nb_meal_day food_btw_meals smoke ch2o calorie_check physical_act use_tech freq_alcohol m_trans obesity_lev BMI Obesity
Female 21 1.62 64.0 yes no 2 3 Sometimes no 2 no 0 1 No Public_Transportation Normal_Weight 24.38653 0
Female 21 1.52 56.0 yes no 3 3 Sometimes yes 3 yes 3 0 Sometimes Public_Transportation Normal_Weight 24.23823 0
Male 23 1.80 77.0 yes no 2 3 Sometimes no 2 no 2 1 Frequently Public_Transportation Normal_Weight 23.76543 0
Male 27 1.80 87.0 no no 3 3 Sometimes no 2 no 2 0 Frequently Walking Overweight_Level_I 26.85185 0
Male 22 1.78 89.8 no no 2 1 Sometimes no 2 no 0 0 Sometimes Public_Transportation Overweight_Level_II 28.34238 0
Male 29 1.62 53.0 no yes 2 3 Sometimes no 2 no 0 0 Sometimes Automobile Normal_Weight 20.19509 0

Twelve predictors associated with obesity-related behaviors, dietary habits, physical activity, and lifestyle factors were selected for analysis. These variables, along with the binary target variable Obesity (1 = obese, 0 = not obese), formed the dataset for logistic regression modeling. The dataset was reviewed to ensure correct structure and readiness for analysis.

Code
predictors <- c("family_hist", "caloric_food", "vegetable_food", "nb_meal_day", "food_btw_meals", "smoke", "ch2o", "calorie_check", "physical_act", "use_tech", "freq_alcohol", "m_trans")
model_data <- dataset[, c("Obesity", predictors)]

str_output <- capture.output(str(model_data))
str_table <- data.frame(Structure = str_output, stringsAsFactors = FALSE)

str_table %>%
  kable(format = "html", caption = "Structure of Model Data") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Structure of Model Data
Structure
'data.frame': 2087 obs. of 13 variables:
$ Obesity : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ family_hist : chr "yes" "yes" "yes" "no" ...
$ caloric_food : chr "no" "no" "no" "no" ...
$ vegetable_food: num 2 3 2 3 2 2 3 2 3 2 ...
$ nb_meal_day : num 3 3 3 3 1 3 3 3 3 3 ...
$ food_btw_meals: chr "Sometimes" "Sometimes" "Sometimes" "Sometimes" ...
$ smoke : chr "no" "yes" "no" "no" ...
$ ch2o : num 2 3 2 2 2 2 2 2 2 2 ...
$ calorie_check : chr "no" "yes" "no" "no" ...
$ physical_act : num 0 3 2 2 0 0 1 3 1 1 ...
$ use_tech : num 1 0 1 0 0 0 0 0 1 1 ...
$ freq_alcohol : chr "No" "Sometimes" "Frequently" "Frequently" ...
$ m_trans : chr "Public_Transportation" "Public_Transportation" "Public_Transportation" "Walking" ...
Code
head(model_data) %>%
  kable(format = "html", caption = "First 6 Rows of Model Data") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
First 6 Rows of Model Data
Obesity family_hist caloric_food vegetable_food nb_meal_day food_btw_meals smoke ch2o calorie_check physical_act use_tech freq_alcohol m_trans
0 yes no 2 3 Sometimes no 2 no 0 1 No Public_Transportation
0 yes no 3 3 Sometimes yes 3 yes 3 0 Sometimes Public_Transportation
0 yes no 2 3 Sometimes no 2 no 2 1 Frequently Public_Transportation
0 no no 3 3 Sometimes no 2 no 2 0 Frequently Walking
0 no no 2 1 Sometimes no 2 no 0 0 Sometimes Public_Transportation
0 no yes 2 3 Sometimes no 2 no 0 0 Sometimes Automobile

Model Development

Three regression models were employed to ensure a systematic and robust approach to predictor selection and model development. The null model, containing only the intercept, served as a baseline to represent predictions without the influence of any predictors. This provided a reference point to evaluate how much additional explanatory power was gained by including predictors.

The full model, incorporating all predictors, represented the maximum complexity allowable within the dataset. This model helped understand the potential contribution of each variable but carried the risk of overfitting due to its complexity.

The stepwise model, guided by the Akaike Information Criterion (AIC), balanced the simplicity and performance of the model. By iteratively evaluating the inclusion or exclusion of predictors, the stepwise procedure identified the subset of variables that significantly contributed to explaining the outcome while minimizing unnecessary complexity. This process ensured that the final model retained only the most relevant predictors, achieving optimal fit and generalizability. Using these three models allowed for a thorough comparison and the development of a parsimonious and effective predictive model.

Code
full_model <- glm(Obesity ~ ., data = model_data, family = binomial)
null_model <- glm(Obesity ~ 1, data = model_data, family = binomial)
stepwise_model <- step(null_model, scope = list(lower = null_model, upper = full_model), direction = "both", trace = FALSE)

Presented below is a comprehensive overview of the logistic regression models.

Full Model

Code
summary(full_model)

Call:
glm(formula = Obesity ~ ., family = binomial, data = model_data)

Coefficients:
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                  -15.37085  324.74530  -0.047 0.962249    
family_histyes                 3.68053    0.37463   9.825  < 2e-16 ***
caloric_foodyes                2.07254    0.25814   8.029 9.86e-16 ***
vegetable_food                 0.87460    0.11248   7.776 7.50e-15 ***
nb_meal_day                    0.03726    0.07761   0.480 0.631165    
food_btw_mealsFrequently      -2.04138    0.59237  -3.446 0.000569 ***
food_btw_mealsNo              -0.71212    0.90552  -0.786 0.431618    
food_btw_mealsSometimes        1.27343    0.45814   2.780 0.005443 ** 
smokeyes                       1.04931    0.46080   2.277 0.022778 *  
ch2o                           0.16276    0.10063   1.617 0.105793    
calorie_checkyes              -2.65916    0.63189  -4.208 2.57e-05 ***
physical_act                  -0.32775    0.07187  -4.560 5.11e-06 ***
use_tech                      -0.39306    0.09663  -4.068 4.74e-05 ***
freq_alcoholFrequently         5.95748  324.74480   0.018 0.985364    
freq_alcoholNo                 6.54211  324.74462   0.020 0.983927    
freq_alcoholSometimes          6.70105  324.74464   0.021 0.983537    
m_transBike                    0.28103    1.43250   0.196 0.844466    
m_transMotorbike               1.60267    0.93760   1.709 0.087389 .  
m_transPublic_Transportation   0.57136    0.12964   4.407 1.05e-05 ***
m_transWalking                -1.90502    0.65158  -2.924 0.003459 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2883.9  on 2086  degrees of freedom
Residual deviance: 1930.7  on 2067  degrees of freedom
AIC: 1970.7

Number of Fisher Scoring iterations: 11
Code
coef_table <- coef(summary(full_model)) %>%
  as.data.frame() %>%
  tibble::rownames_to_column("Predictor")

coef_table %>%
  kable(format = "html", caption = "Coefficients of the Full Logistic Regression Model") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Coefficients of the Full Logistic Regression Model
Predictor Estimate Std. Error z value Pr(>|z|)
(Intercept) -15.3708461 324.7452987 -0.0473320 0.9622486
family_histyes 3.6805286 0.3746264 9.8245297 0.0000000
caloric_foodyes 2.0725408 0.2581442 8.0286177 0.0000000
vegetable_food 0.8746010 0.1124784 7.7757248 0.0000000
nb_meal_day 0.0372577 0.0776059 0.4800884 0.6311646
food_btw_mealsFrequently -2.0413821 0.5923659 -3.4461506 0.0005686
food_btw_mealsNo -0.7121231 0.9055177 -0.7864264 0.4316177
food_btw_mealsSometimes 1.2734295 0.4581389 2.7795707 0.0054431
smokeyes 1.0493108 0.4608027 2.2771367 0.0227781
ch2o 0.1627569 0.1006291 1.6173944 0.1057932
calorie_checkyes -2.6591572 0.6318897 -4.2082618 0.0000257
physical_act -0.3277453 0.0718680 -4.5603756 0.0000051
use_tech -0.3930575 0.0966253 -4.0678543 0.0000474
freq_alcoholFrequently 5.9574819 324.7447982 0.0183451 0.9853635
freq_alcoholNo 6.5421132 324.7446160 0.0201454 0.9839274
freq_alcoholSometimes 6.7010533 324.7446385 0.0206348 0.9835369
m_transBike 0.2810325 1.4324967 0.1961837 0.8444664
m_transMotorbike 1.6026737 0.9376006 1.7093351 0.0873889
m_transPublic_Transportation 0.5713577 0.1296424 4.4071834 0.0000105
m_transWalking -1.9050177 0.6515824 -2.9236788 0.0034592

Null Model

Code
null_model_summary <- summary(null_model)

null_model_coef_table <- coef(null_model_summary) %>%
  as.data.frame() %>%
  tibble::rownames_to_column("Predictor")

null_model_coef_table %>%
  kable(format = "html", caption = "Coefficients of the Null Logistic Regression Model") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Coefficients of the Null Logistic Regression Model
Predictor Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.133403 0.0438767 -3.040409 0.0023626

Stepwise Model

Code
stepwise_summary <- summary(stepwise_model)

stepwise_coef_table <- coef(stepwise_summary) %>%
  as.data.frame() %>%
  tibble::rownames_to_column("Predictor")

stepwise_coef_table %>%
  kable(format = "html", caption = "Coefficients of the Stepwise Logistic Regression Model") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Coefficients of the Stepwise Logistic Regression Model
Predictor Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.7956227 0.7141792 -12.3157087 0.0000000
family_histyes 3.6796264 0.3746724 9.8209158 0.0000000
food_btw_mealsFrequently -2.0244197 0.5873593 -3.4466462 0.0005676
food_btw_mealsNo -0.6183034 0.8999773 -0.6870211 0.4920694
food_btw_mealsSometimes 1.3604530 0.4559138 2.9840138 0.0028449
caloric_foodyes 2.1168040 0.2566262 8.2485893 0.0000000
vegetable_food 0.8886854 0.1115908 7.9637892 0.0000000
m_transBike 0.3743789 1.4395954 0.2600584 0.7948187
m_transMotorbike 1.6429866 0.9411341 1.7457517 0.0808541
m_transPublic_Transportation 0.6015745 0.1282129 4.6919960 0.0000027
m_transWalking -1.8531534 0.6503271 -2.8495711 0.0043778
calorie_checkyes -2.6730603 0.6307480 -4.2379214 0.0000226
physical_act -0.3462794 0.0705278 -4.9098293 0.0000009
use_tech -0.4174300 0.0956562 -4.3638577 0.0000128
smokeyes 1.0211075 0.4549016 2.2446777 0.0247888
ch2o 0.1679017 0.0994235 1.6887520 0.0912670

Evaluation

To evaluate the stepwise-selected model, predicted probabilities of obesity were generated for all individuals. These probabilities were converted into binary classifications using a threshold of 0.5. A confusion matrix was constructed to assess the model’s performance, providing key metrics such as accuracy, sensitivity, specificity, precision, and F1-score.

Code
# predicted_probs <- predict(stepwise_model, type = "response")
# predicted_classes <- ifelse(predicted_probs >= 0.5, 1, 0)
# conf_matrix <- confusionMatrix(as.factor(predicted_classes), model_data$Obesity)
# print(conf_matrix)
Code
library(caret)
library(pROC)
# Predict probabilities from the stepwise model
predicted_probs <- predict(stepwise_model, type = "response")

# Convert probabilities to binary classes (using a threshold of 0.5)
predicted_classes <- ifelse(predicted_probs >= 0.5, 1, 0)

# Create a confusion matrix (comparison of predicted vs. actual values)
conf_matrix <- confusionMatrix(as.factor(predicted_classes), model_data$Obesity)

conf_matrix_table <- as.data.frame(conf_matrix$table)
colnames(conf_matrix_table) <- c("Actual", "Predicted", "Count")
conf_matrix_table <- conf_matrix_table %>%
  group_by(Actual) %>%
  mutate(Percentage = round((Count / sum(Count)) * 100, 2)) %>%
  ungroup()

conf_matrix_table %>%
  kable(format = "html", caption = "Confusion Matrix: Predicted vs Actual Values with Percentages") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  column_spec(1, bold = TRUE, width = "150px") %>%
  column_spec(2, width = "150px") %>%
  column_spec(3, width = "100px") %>%
  column_spec(4, width = "100px") %>%
  scroll_box(width = "100%", height = "400px")
Confusion Matrix: Predicted vs Actual Values with Percentages
Actual Predicted Count Percentage
0 0 749 84.35
1 0 364 30.36
0 1 139 15.65
1 1 835 69.64
Code
performance_metrics_vertical <- as.data.frame(conf_matrix$overall) %>%
  tibble::rownames_to_column("Metric") %>%
  tidyr::pivot_longer(cols = -Metric, names_to = NULL, values_to = "Value")
performance_metrics_vertical %>%
  kable(format = "html", caption = "Confusion Matrix Overall Performance Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Confusion Matrix Overall Performance Metrics
Metric Value
Accuracy 0.7589842
Kappa 0.5227055
AccuracyLower 0.7400388
AccuracyUpper 0.7771994
AccuracyNull 0.5333014
AccuracyPValue 0.0000000
McnemarPValue 0.0000000
Code
class_metrics_vertical <- as.data.frame(conf_matrix$byClass) %>%
  tibble::rownames_to_column("Metric") %>%
  tidyr::pivot_longer(cols = -Metric, names_to = NULL, values_to = "Value")
class_metrics_vertical %>%
  kable(format = "html", caption = "Class-Level Performance Metrics") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(0, bold = TRUE, background = "#f5f5f5") %>%
  scroll_box(width = "100%", height = "400px")
Class-Level Performance Metrics
Metric Value
Sensitivity 0.6729560
Specificity 0.8572895
Pos Pred Value 0.8434685
Neg Pred Value 0.6964137
Precision 0.8434685
Recall 0.6729560
F1 0.7486257
Prevalence 0.5333014
Detection Rate 0.3588884
Detection Prevalence 0.4254911
Balanced Accuracy 0.7651228
Code
# Compute ROC curve using the actual class labels and predicted probabilities
roc_curve <- pROC::roc(model_data$Obesity, predicted_probs)
auc_value <- pROC::auc(roc_curve)
print(auc_value)
Area under the curve: 0.8556

Additionally, the ROC curve and AUC were calculated to further evaluate the model’s discriminative ability. The ROC curve visualizes the trade-off between sensitivity and specificity, while the AUC quantifies the model’s ability to distinguish between obese and non-obese individuals.

Results Visualization

To assess the distribution of predicted probabilities, a scatter plot was created with observations color-coded by their actual class. This visualization provides a clear overview of the model’s predictions and potential misclassifications.

Code
plot(predicted_probs, col = ifelse(model_data$Obesity == 1, "blue", "red"), pch = 16, xlab = "n° Observation", ylab = "Predicted Probability", main = "Predicted Probabilities of Obesity", cex = 0.6)
legend("bottomright", legend = c("Obese", "Not Obese"), col = c("blue", "red"), pch = 16)

For additional clarity, the ROC curve was plotted to visually represent the model’s performance.

Code
plot(roc_curve, col = "blue", main = "ROC Curve", lwd = 3, xlim = c(0, 1), ylim = c(0, 1.05), xlab = "False Positive Rate", ylab = "True Positive Rate", cex.main = 1.5, cex.lab = 1.2, cex.axis = 1.1)
legend("topright", legend = paste("AUC =", round(auc_value, 3)), lwd = 0, cex = 1.2, bty = "n")
grid()

Predicting Obesity Probability

To test the model’s ability to predict the probability of individuals becoming obese, six distinct profiles were created, representing a diverse range of lifestyles. Each profile was carefully designed to highlight specific behavioral, dietary, and lifestyle patterns.

The first individual represents a high-risk case for obesity. This person has a family history of being overweight, frequently consumes high-calorie foods and snacks, and eats very few vegetables. They have five meals a day, drink only 0.5 liters of water daily, and do no physical activity. Additionally, they spend 10 hours a day using technology, consume alcohol consistently, and rely primarily on public transportation for mobility.

The second individual exemplifies a very healthy lifestyle. They have no family history of being overweight, rarely consume high-calorie foods or snacks, and eat a large amount of vegetables. Their diet consists of very few meals per day, complemented by a high water intake of 4 liters daily. They do not monitor calorie intake but engage in physical activity five times a week. They walk as their primary mode of transportation, do not consume alcohol, and spend only 0.5 hours daily using technology.

The third individual exhibits a balanced lifestyle but shows some risk factors. This person has a family history of being overweight, frequently consumes snacks and high-calorie foods, and eats a moderate amount of vegetables. They have three meals a day, drink 1 liter of water, and monitor their calorie intake. However, they engage in physical activity only once a week, use technology for 8 hours daily, use motorbike as transportation vehicle, and occasionally consume alcohol.

The fourth individual is physically active and health-conscious. They have no family history of being overweight, do not frequently consume high-calorie foods, but snack occasionally. They eat a lot of vegetables, have a small number of meals per day, and drink 3 liters of water daily. They do not monitor calorie intake but exercise three times a week and use a bicycle for transportation. They consume alcohol frequently but spend only an hour daily using technology.

The fifth individual represents another high-risk case due to a sedentary lifestyle. They have a family history of being overweight, frequently consume high-calorie foods and snacks, and eat very few vegetables. They have four meals a day, drink 2 liters of water, and do no physical activity. They spend 6 hours daily using technology, consume alcohol moderately, and rely on public transportation.

The sixth individual leads a very active lifestyle but has some risk factors due to alcohol and transportation choices. They have no family history of being overweight, do not frequently consume high-calorie foods or snacks, and eat a large amount of vegetables. They have two meals per day, drink 1.5 liters of water, and do not monitor calorie intake. However, they engage in physical activity four times a week, use a motorbike for transportation, do not consume alcohol, and spend 2 hours daily using technology.

These six profiles were designed to test the model’s capacity to handle a wide variety of real-world scenarios, ensuring it can effectively predict obesity probabilities across diverse populations.

Code
new_data <- data.frame(
  family_hist = factor(c("yes", "no", "yes", "no", "yes", "no"), 
                                          levels = c("yes", "no")),
  caloric_food = factor(c("yes", "no", "yes", "no", "yes", "no"), 
                levels = c("yes", "no")),
  vegetable_food = c(1, 5, 2, 4, 1, 3),
  nb_meal_day = c(5, 1, 3, 2, 4, 2),
  food_btw_meals = factor(c("Frequently", "Sometimes", "Always", "Sometimes", "Frequently", "Always"), 
                levels = c("Frequently", "Sometimes", "Always")),
  smoke = factor(c("no", "yes", "no", "yes", "yes", "no"), 
                 levels = c("yes", "no")),
  ch2o = c(0.5, 4, 1, 3, 2, 1.5),
  calorie_check = factor(c("yes", "no", "yes", "no", "yes", "no"), 
               levels = c("yes", "no")),
  physical_act = c(0, 5, 1, 3, 0, 4),
  use_tech = c(10, 0.5, 8, 1, 6, 2),
  freq_alcohol = factor(c("Always", "Never", "Sometimes", "Always", "Sometimes", "Never"), 
                levels = c("Sometimes", "Frequently", "Always", "Never")),
  m_trans = factor(c("Public_Transportation", "Walking", "Motorbike", "Bike", "Public_Transportation", "Motorbike"), 
                  levels = c("Public_Transportation", "Walking", "Bike", "Motorbike"))
)

probability_table <- tibble(
  Reference = 1:length(predicted_probs),
  Predicted_Probability = predicted_probs,
  Probability_Percentage = predicted_probs * 100
)

probability_table %>%
  kbl(format = "html", caption = "Predicted Probabilities") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE) %>%
  column_spec(1, width = "100px") %>%
  scroll_box(width = "100%", height = "400px")
Predicted Probabilities
Reference Predicted_Probability Probability_Percentage
1 0.1887390 18.8739011
2 0.0644648 6.4464846
3 0.1042580 10.4258030
4 0.0009303 0.0930347
5 0.0088327 0.8832714
6 0.0389708 3.8970823
7 0.9346141 93.4614119
8 0.0031435 0.3143533
9 0.7687000 76.8700007
10 0.5774465 57.7446538
11 0.0584313 5.8431319
12 0.0022560 0.2255979
13 0.0126622 1.2662178
14 0.0131879 1.3187893
15 0.7375133 73.7513268
16 0.0049885 0.4988537
17 0.6368755 63.6875453
18 0.0389708 3.8970823
19 0.2371178 23.7117773
20 0.2609967 26.0996704
21 0.0046362 0.4636165
22 0.9156167 91.5616655
23 0.7343568 73.4356776
24 0.3435450 34.3544989
25 0.6589412 65.8941180
26 0.0050754 0.5075417
27 0.0026128 0.2612763
28 0.0004526 0.0452598
29 0.1029716 10.2971635
30 0.0013780 0.1377971
31 0.0022709 0.2270932
32 0.1743688 17.4368779
33 0.0004216 0.0421632
34 0.0088327 0.8832714
35 0.0012529 0.1252890
36 0.0712124 7.1212371
37 0.0001856 0.0185575
38 0.0423790 4.2379037
39 0.5774465 57.7446538
40 0.5510893 55.1089274
41 0.6747460 67.4746006
42 0.0000259 0.0025932
43 0.5048554 50.4855424
44 0.7914037 79.1403692
45 0.0002534 0.0253366
46 0.1643617 16.4361703
47 0.0236097 2.3609674
48 0.0236097 2.3609674
49 0.0495894 4.9589423
50 0.0689060 6.8905994
51 0.3231000 32.3100047
52 0.0331470 3.3147032
53 0.0376027 3.7602730
54 0.0026943 0.2694311
55 0.0032793 0.3279283
56 0.0001654 0.0165418
57 0.4723042 47.2304167
58 0.1643617 16.4361703
59 0.2220529 22.2052949
60 0.4295940 42.9593981
61 0.4471093 44.7109294
62 0.0169584 1.6958445
63 0.0101270 1.0126980
64 0.4108566 41.0856644
65 0.0376707 3.7670683
66 0.0262119 2.6211937
67 0.1183667 11.8366663
68 0.2135853 21.3585265
69 0.0172091 1.7209072
70 0.0366473 3.6647317
71 0.2461404 24.6140448
72 0.0031853 0.3185317
73 0.0199919 1.9991920
74 0.0988492 9.8849178
75 0.0718647 7.1864741
76 0.0339008 3.3900839
77 0.0333806 3.3380583
78 0.7762441 77.6244059
79 0.6956103 69.5610347
80 0.2609967 26.0996704
81 0.0014940 0.1494011
82 0.0903916 9.0391556
83 0.3313969 33.1396929
84 0.0000105 0.0010512
85 0.0614396 6.1439555
86 0.2609967 26.0996704
87 0.0767575 7.6757481
88 0.7457381 74.5738051
89 0.0026520 0.2652021
90 0.5212945 52.1294524
91 0.0023461 0.2346106
92 0.0099218 0.9921770
93 0.0000722 0.0072221
94 0.0003376 0.0337639
95 0.0826033 8.2603282
96 0.0143501 1.4350149
97 0.1110922 11.1092213
98 0.0006204 0.0620410
99 0.0020730 0.2072987
100 0.1320638 13.2063759
101 0.0118247 1.1824726
102 0.0000475 0.0047455
103 0.1459834 14.5983434
104 0.6164329 61.6432902
105 0.0357026 3.5702610
106 0.1867375 18.6737524
107 0.0054791 0.5479106
108 0.7354802 73.5480201
109 0.8245183 82.4518315
110 0.7126128 71.2612776
111 0.0656727 6.5672742
112 0.0106198 1.0619777
113 0.1459834 14.5983434
114 0.1373344 13.7334429
115 0.0736701 7.3670114
116 0.0055446 0.5544567
117 0.5774465 57.7446538
118 0.0169643 1.6964269
119 0.0101775 1.0177524
120 0.0174226 1.7422636
121 0.0014924 0.1492378
122 0.8345765 83.4576463
123 0.1320638 13.2063759
124 0.4293647 42.9364687
125 0.0497414 4.9741403
126 0.0102887 1.0288673
127 0.1696542 16.9654239
128 0.0902374 9.0237417
129 0.0442529 4.4252882
130 0.0053007 0.5300697
131 0.0165388 1.6538806
132 0.0210600 2.1060012
133 0.6230842 62.3084167
134 0.5319951 53.1995091
135 0.5774465 57.7446538
136 0.7666743 76.6674287
137 0.0050477 0.5047737
138 0.0016490 0.1649026
139 0.1175347 11.7534707
140 0.0225780 2.2577975
141 0.0015003 0.1500331
142 0.0769585 7.6958500
143 0.0062223 0.6222311
144 0.0084865 0.8486539
145 0.0001594 0.0159385
146 0.2846350 28.4634967
147 0.0246755 2.4675504
148 0.6616289 66.1628855
149 0.4915059 49.1505931
150 0.6616289 66.1628855
151 0.5337211 53.3721064
152 0.1027509 10.2750872
153 0.7457381 74.5738051
154 0.0581149 5.8114859
155 0.0088024 0.8802411
156 0.0317115 3.1711451
157 0.3576032 35.7603244
158 0.6164329 61.6432902
159 0.3991777 39.9177733
160 0.0278827 2.7882675
161 0.7176410 71.7641028
162 0.0000179 0.0017939
163 0.0061775 0.6177508
164 0.3696151 36.9615127
165 0.1077304 10.7730378
166 0.0092242 0.9224162
167 0.6747460 67.4746006
168 0.0440096 4.4009623
169 0.0238087 2.3808735
170 0.0656727 6.5672742
171 0.0066078 0.6607797
172 0.0009847 0.0984663
173 0.5947051 59.4705053
174 0.5224999 52.2499929
175 0.0058696 0.5869631
176 0.1271396 12.7139647
177 0.0497414 4.9741403
178 0.0775032 7.7503172
179 0.0040009 0.4000870
180 0.8731625 87.3162521
181 0.0083295 0.8329483
182 0.1393443 13.9344289
183 0.0602795 6.0279450
184 0.1296561 12.9656107
185 0.5774465 57.7446538
186 0.0497414 4.9741403
187 0.0226499 2.2649895
188 0.0134176 1.3417586
189 0.0000081 0.0008126
190 0.0918615 9.1861524
191 0.2220529 22.2052949
192 0.0795022 7.9502214
193 0.7003182 70.0318244
194 0.0351745 3.5174524
195 0.5360370 53.6036951
196 0.4640025 46.4002487
197 0.2584731 25.8473094
198 0.9168632 91.6863169
199 0.1393443 13.9344289
200 0.1979330 19.7933012
201 0.8520584 85.2058391
202 0.0473617 4.7361663
203 0.0002047 0.0204729
204 0.5774465 57.7446538
205 0.0614396 6.1439555
206 0.0283263 2.8326330
207 0.2609967 26.0996704
208 0.8345765 83.4576463
209 0.6747460 67.4746006
210 0.7854538 78.5453765
211 0.0008837 0.0883665
212 0.0000031 0.0003149
213 0.0488778 4.8877846
214 0.7104626 71.0462561
215 0.0003584 0.0358369
216 0.5360370 53.6036951
217 0.4843260 48.4325958
218 0.1681836 16.8183567
219 0.7457381 74.5738051
220 0.0418267 4.1826681
221 0.0220850 2.2085018
222 0.0773691 7.7369142
223 0.0138516 1.3851568
224 0.1945609 19.4560943
225 0.2660630 26.6062960
226 0.9590544 95.9054418
227 0.0887722 8.8772211
228 0.2497468 24.9746802
229 0.1686387 16.8638700
230 0.0238435 2.3843532
231 0.0001045 0.0104499
232 0.7375133 73.7513268
233 0.0902374 9.0237417
234 0.0184243 1.8424301
235 0.0391874 3.9187372
236 0.0482932 4.8293229
237 0.0685255 6.8525454
238 0.7887096 78.8709590
239 0.0120518 1.2051829
240 0.0107484 1.0748402
241 0.1229852 12.2985176
242 0.0006581 0.0658065
243 0.0281589 2.8158877
244 0.1781589 17.8158871
245 0.7687000 76.8700007
246 0.1862645 18.6264548
247 0.2753709 27.5370898
248 0.5599968 55.9996850
249 0.4620458 46.2045841
250 0.0393349 3.9334894
251 0.2280821 22.8082138
252 0.0002233 0.0223312
253 0.0127938 1.2793818
254 0.0222507 2.2250686
255 0.0423790 4.2379037
256 0.0000355 0.0035497
257 0.0333323 3.3332343
258 0.0003254 0.0325413
259 0.0050134 0.5013372
260 0.0002872 0.0287194
261 0.0060610 0.6060979
262 0.0033302 0.3330227
263 0.0009911 0.0991070
264 0.1423205 14.2320550
265 0.0545190 5.4518954
266 0.0009190 0.0919049
267 0.0238087 2.3808735
268 0.2196939 21.9693865
269 0.0024768 0.2476791
270 0.1373344 13.7334429
271 0.0767575 7.6757481
272 0.0008216 0.0821557
273 0.7962664 79.6266355
274 0.0099968 0.9996780
275 0.0166477 1.6647673
276 0.0903916 9.0391556
277 0.0376707 3.7670683
278 0.0002983 0.0298263
279 0.3597647 35.9764721
280 0.3888033 38.8803275
281 0.0283263 2.8326330
282 0.5510893 55.1089274
283 0.0619373 6.1937340
284 0.0020397 0.2039657
285 0.0317115 3.1711451
286 0.0000710 0.0070968
287 0.0376707 3.7670683
288 0.0083929 0.8392851
289 0.7142217 71.4221744
290 0.0004261 0.0426070
291 0.0559923 5.5992274
292 0.0024370 0.2436983
293 0.0524413 5.2441266
294 0.0613315 6.1331457
295 0.1479953 14.7995326
296 0.0238087 2.3808735
297 0.0277525 2.7752544
298 0.0123521 1.2352100
299 0.1210120 12.1011971
300 0.7285203 72.8520259
301 0.1183667 11.8366663
302 0.0351745 3.5174524
303 0.0265959 2.6595908
304 0.0317115 3.1711451
305 0.0044061 0.4406148
306 0.0510926 5.1092591
307 0.1259956 12.5995590
308 0.0091588 0.9158786
309 0.0186319 1.8631943
310 0.0044233 0.4423252
311 0.0773691 7.7369142
312 0.0588828 5.8882793
313 0.0220850 2.2085018
314 0.4602173 46.0217319
315 0.0031327 0.3132687
316 0.0003672 0.0367225
317 0.3625292 36.2529215
318 0.7126128 71.2612776
319 0.0423790 4.2379037
320 0.7162339 71.6233889
321 0.0240123 2.4012302
322 0.0199338 1.9933792
323 0.7988887 79.8888671
324 0.0016962 0.1696153
325 0.4915059 49.1505931
326 0.1512285 15.1228521
327 0.0000880 0.0088029
328 0.0000531 0.0053131
329 0.0002872 0.0287194
330 0.0464840 4.6484040
331 0.0031771 0.3177100
332 0.0037993 0.3799301
333 0.0002602 0.0260163
334 0.0773691 7.7369142
335 0.0074777 0.7477711
336 0.8345765 83.4576463
337 0.0076106 0.7610597
338 0.0737702 7.3770240
339 0.0106483 1.0648263
340 0.0852531 8.5253106
341 0.1219036 12.1903553
342 0.0372401 3.7240138
343 0.2931257 29.3125674
344 0.0039224 0.3922396
345 0.0002388 0.0238841
346 0.0004795 0.0479469
347 0.3032899 30.3289855
348 0.0222099 2.2209906
349 0.0971664 9.7166438
350 0.0004337 0.0433696
351 0.2065478 20.6547768
352 0.1037207 10.3720717
353 0.1998753 19.9875282
354 0.0009868 0.0986756
355 0.5947051 59.4705053
356 0.2497468 24.9746802
357 0.0238087 2.3808735
358 0.1987011 19.8701061
359 0.3377092 33.7709187
360 0.3625292 36.2529215
361 0.0005842 0.0584156
362 0.6164329 61.6432902
363 0.0001711 0.0171137
364 0.0283263 2.8326330
365 0.0373284 3.7328391
366 0.0349161 3.4916067
367 0.0311144 3.1114359
368 0.0176289 1.7628948
369 0.0435710 4.3570967
370 0.1660140 16.6014031
371 0.0001407 0.0140663
372 0.0016962 0.1696153
373 0.2100053 21.0005342
374 0.0145212 1.4521221
375 0.0927392 9.2739210
376 0.0066201 0.6620124
377 0.1130660 11.3065974
378 0.0152794 1.5279358
379 0.0011669 0.1166944
380 0.0009500 0.0949953
381 0.1922660 19.2265999
382 0.2280821 22.8082138
383 0.5466945 54.6694539
384 0.2786029 27.8602941
385 0.0005039 0.0503925
386 0.0001885 0.0188519
387 0.0196519 1.9651923
388 0.6202660 62.0266048
389 0.1562532 15.6253240
390 0.1159275 11.5927475
391 0.0029571 0.2957090
392 0.6073866 60.7386575
393 0.0017558 0.1755757
394 0.2609967 26.0996704
395 0.0087685 0.8768524
396 0.0067940 0.6794023
397 0.3576032 35.7603244
398 0.0093641 0.9364137
399 0.0076106 0.7610597
400 0.0011370 0.1136955
401 0.0053572 0.5357174
402 0.4620458 46.2045841
403 0.7354802 73.5480201
404 0.0041350 0.4134975
405 0.0014940 0.1494011
406 0.0009709 0.0970873
407 0.0004350 0.0434986
408 0.1123301 11.2330141
409 0.1927685 19.2768528
410 0.0028716 0.2871552
411 0.7687000 76.8700007
412 0.0826033 8.2603282
413 0.5048554 50.4855424
414 0.1189408 11.8940803
415 0.1189408 11.8940803
416 0.0286545 2.8654505
417 0.0902711 9.0271069
418 0.0002807 0.0280709
419 0.1987011 19.8701061
420 0.0048872 0.4887238
421 0.0087274 0.8727362
422 0.0016962 0.1696153
423 0.0022430 0.2243036
424 0.0423790 4.2379037
425 0.0008257 0.0825679
426 0.0025726 0.2572593
427 0.5360370 53.6036951
428 0.0003202 0.0320172
429 0.4471093 44.7109294
430 0.0331470 3.3147032
431 0.0437114 4.3711394
432 0.0314776 3.1477610
433 0.0016490 0.1649026
434 0.0331470 3.3147032
435 0.0590104 5.9010352
436 0.1423205 14.2320550
437 0.5599968 55.9996850
438 0.5947051 59.4705053
439 0.0216607 2.1660698
440 0.1078650 10.7865014
441 0.4915059 49.1505931
442 0.7235086 72.3508642
443 0.2074944 20.7494445
444 0.0773691 7.7369142
445 0.0083929 0.8392851
446 0.0005407 0.0540727
447 0.0769585 7.6958500
448 0.0971664 9.7166438
449 0.0009868 0.0986756
450 0.0060610 0.6060979
451 0.0588828 5.8882793
452 0.0074233 0.7423295
453 0.0348207 3.4820722
454 0.5599968 55.9996850
455 0.2919322 29.1932227
456 0.0895872 8.9587237
457 0.0002388 0.0238841
458 0.0376707 3.7670683
459 0.0733772 7.3377180
460 0.0036333 0.3633258
461 0.1059910 10.5990989
462 0.0008865 0.0886513
463 0.1252824 12.5282437
464 0.0003420 0.0342014
465 0.0238524 2.3852379
466 0.0423790 4.2379037
467 0.0032586 0.3258635
468 0.8345765 83.4576463
469 0.0065297 0.6529659
470 0.0009868 0.0986756
471 0.0003164 0.0316360
472 0.0395837 3.9583722
473 0.1723838 17.2383799
474 0.0442529 4.4252882
475 0.0588828 5.8882793
476 0.3709072 37.0907199
477 0.0672703 6.7270252
478 0.0497414 4.9741403
479 0.2951879 29.5187910
480 0.1927685 19.2768528
481 0.0264323 2.6432345
482 0.0012135 0.1213505
483 0.0748742 7.4874169
484 0.0028169 0.2816947
485 0.0029958 0.2995816
486 0.1525251 15.2525142
487 0.2323109 23.2310908
488 0.0041350 0.4134975
489 0.7849063 78.4906274
490 0.7037444 70.3744372
491 0.8765162 87.6516173
492 0.7231903 72.3190327
493 0.8579265 85.7926470
494 0.7891057 78.9105726
495 0.8730628 87.3062836
496 0.8632738 86.3273771
497 0.7739988 77.3998818
498 0.4505038 45.0503795
499 0.4931205 49.3120490
500 0.6369346 63.6934618
501 0.0842559 8.4255913
502 0.0808745 8.0874497
503 0.0462084 4.6208440
504 0.0005010 0.0500954
505 0.0005994 0.0599353
506 0.0006648 0.0664832
507 0.0002818 0.0281849
508 0.0004177 0.0417690
509 0.0002889 0.0288945
510 0.0035566 0.3556550
511 0.0114426 1.1442584
512 0.0050639 0.5063921
513 0.0022033 0.2203260
514 0.0051290 0.5128981
515 0.0039542 0.3954162
516 0.0039174 0.3917356
517 0.0044369 0.4436941
518 0.0004737 0.0473655
519 0.0004645 0.0464499
520 0.0000639 0.0063874
521 0.7281731 72.8173086
522 0.6681158 66.8115805
523 0.6779944 67.7994436
524 0.1124686 11.2468633
525 0.1369171 13.6917137
526 0.1070297 10.7029688
527 0.0297491 2.9749119
528 0.0311935 3.1193520
529 0.0562049 5.6204865
530 0.0002056 0.0205581
531 0.0001941 0.0194120
532 0.0002677 0.0267710
533 0.0242828 2.4282820
534 0.0153611 1.5361113
535 0.0296685 2.9668451
536 0.0130702 1.3070177
537 0.0084075 0.8407458
538 0.0263429 2.6342910
539 0.0730374 7.3037370
540 0.1420182 14.2018163
541 0.2083987 20.8398710
542 0.0502414 5.0241357
543 0.0330422 3.3042157
544 0.0459778 4.5977811
545 0.0594528 5.9452786
546 0.0409256 4.0925563
547 0.0577315 5.7731516
548 0.5577299 55.7729888
549 0.2881803 28.8180302
550 0.2860731 28.6073052
551 0.0391286 3.9128589
552 0.0343409 3.4340881
553 0.0442423 4.4242341
554 0.5485611 54.8561092
555 0.5577182 55.7718196
556 0.2905686 29.0568558
557 0.1377582 13.7758217
558 0.0824715 8.2471458
559 0.1022608 10.2260815
560 0.0533217 5.3321720
561 0.0512119 5.1211851
562 0.0492339 4.9233895
563 0.0004939 0.0493886
564 0.0004231 0.0423096
565 0.0003901 0.0390146
566 0.0003534 0.0353407
567 0.0003020 0.0302034
568 0.0004257 0.0425713
569 0.3737546 37.3754554
570 0.4079621 40.7962076
571 0.4584746 45.8474613
572 0.0852295 8.5229534
573 0.1054264 10.5426351
574 0.0179423 1.7942330
575 0.0412267 4.1226676
576 0.0198807 1.9880727
577 0.0229509 2.2950861
578 0.0108690 1.0868989
579 0.0158111 1.5811093
580 0.0947356 9.4735607
581 0.4433012 44.3301162
582 0.6168104 61.6810371
583 0.3958826 39.5882606
584 0.6616289 66.1628855
585 0.5351453 53.5145298
586 0.5062480 50.6247961
587 0.6012676 60.1267564
588 0.6274157 62.7415678
589 0.4773812 47.7381166
590 0.0000232 0.0023156
591 0.0000258 0.0025806
592 0.0000380 0.0037979
593 0.0045087 0.4508680
594 0.0018215 0.1821458
595 0.0043379 0.4337945
596 0.3545598 35.4559759
597 0.4136629 41.3662855
598 0.4390495 43.9049523
599 0.4329590 43.2958973
600 0.1071433 10.7143285
601 0.0004179 0.0417912
602 0.0002618 0.0261767
603 0.0000380 0.0037960
604 0.0042871 0.4287118
605 0.0042836 0.4283562
606 0.0104352 1.0435178
607 0.7075598 70.7559771
608 0.1344027 13.4402709
609 0.0288410 2.8840978
610 0.0046548 0.4654827
611 0.0351673 3.5167318
612 0.0004400 0.0440019
613 0.1304832 13.0483225
614 0.0581940 5.8194048
615 0.0657880 6.5787989
616 0.4560512 45.6051176
617 0.0393383 3.9338266
618 0.3417246 34.1724637
619 0.0622805 6.2280492
620 0.0397583 3.9758314
621 0.0003642 0.0364173
622 0.0000192 0.0019182
623 0.5297970 52.9796966
624 0.0407852 4.0785188
625 0.0376717 3.7671678
626 0.0159679 1.5967948
627 0.6616289 66.1628855
628 0.6380169 63.8016916
629 0.6415618 64.1561814
630 0.0000328 0.0032831
631 0.0041258 0.4125845
632 0.5481467 54.8146699
633 0.5234583 52.3458279
634 0.3038238 30.3823832
635 0.0483608 4.8360826
636 0.0663546 6.6354578
637 0.0780238 7.8023806
638 0.0608584 6.0858418
639 0.0005502 0.0550175
640 0.0003577 0.0357674
641 0.0005345 0.0534468
642 0.0003474 0.0347354
643 0.0004370 0.0436956
644 0.0004268 0.0426786
645 0.0082196 0.8219560
646 0.1341856 13.4185566
647 0.0842493 8.4249270
648 0.0041326 0.4132603
649 0.0000324 0.0032387
650 0.0027711 0.2771092
651 0.0051243 0.5124261
652 0.1463500 14.6350050
653 0.1345805 13.4580541
654 0.0098131 0.9813083
655 0.7032301 70.3230066
656 0.7106260 71.0625999
657 0.4081208 40.8120765
658 0.7775465 77.7546498
659 0.0754095 7.5409533
660 0.7622463 76.2246288
661 0.0725311 7.2531092
662 0.0351568 3.5156790
663 0.0721416 7.2141648
664 0.0437494 4.3749392
665 0.0058107 0.5810690
666 0.0074970 0.7496986
667 0.0510778 5.1077792
668 0.4059410 40.5941040
669 0.0670572 6.7057222
670 0.0101911 1.0191095
671 0.0095159 0.9515892
672 0.0366464 3.6646418
673 0.0390066 3.9006637
674 0.0980352 9.8035233
675 0.1840282 18.4028237
676 0.0486275 4.8627489
677 0.0316154 3.1615352
678 0.0432548 4.3254804
679 0.0679928 6.7992764
680 0.6996893 69.9689324
681 0.7452408 74.5240767
682 0.5271537 52.7153699
683 0.4098137 40.9813692
684 0.2908704 29.0870359
685 0.0039219 0.3921890
686 0.0386601 3.8660080
687 0.0399467 3.9946651
688 0.6183099 61.8309874
689 0.5628746 56.2874621
690 0.2928584 29.2858392
691 0.1161041 11.6104149
692 0.6253386 62.5338611
693 0.1051882 10.5188230
694 0.0529485 5.2948473
695 0.0347445 3.4744502
696 0.0157168 1.5716779
697 0.0212346 2.1234608
698 0.0102972 1.0297206
699 0.0003010 0.0300960
700 0.0000239 0.0023863
701 0.0043352 0.4335200
702 0.0036662 0.3666220
703 0.3792954 37.9295406
704 0.5602114 56.0211381
705 0.5388420 53.8842034
706 0.1175373 11.7537274
707 0.1117002 11.1700214
708 0.0262072 2.6207222
709 0.0256989 2.5698936
710 0.0309443 3.0944284
711 0.0187031 1.8703110
712 0.0199831 1.9983119
713 0.0073290 0.7329030
714 0.0033366 0.3336583
715 0.3443103 34.4310295
716 0.6247288 62.4728760
717 0.4164681 41.6468095
718 0.6326818 63.2681831
719 0.5073150 50.7314987
720 0.5357338 53.5733837
721 0.4904583 49.0458329
722 0.4246694 42.4669436
723 0.5770360 57.7036020
724 0.0000267 0.0026748
725 0.0000318 0.0031820
726 0.0000359 0.0035933
727 0.0045025 0.4502496
728 0.0026475 0.2647548
729 0.0046162 0.4616225
730 0.4054739 40.5473880
731 0.4348749 43.4874947
732 0.4958315 49.5831486
733 0.0213342 2.1334242
734 0.0079254 0.7925418
735 0.0162606 1.6260595
736 0.7077582 70.7758214
737 0.0031776 0.3177631
738 0.6801257 68.0125736
739 0.0290807 2.9080749
740 0.5420186 54.2018618
741 0.6781274 67.8127412
742 0.4753298 47.5329776
743 0.5903515 59.0351549
744 0.2414768 24.1476774
745 0.5012186 50.1218557
746 0.0084865 0.8486539
747 0.0238280 2.3828005
748 0.6961407 69.6140661
749 0.6893480 68.9347966
750 0.0106398 1.0639795
751 0.6075523 60.7552254
752 0.6159540 61.5953994
753 0.7149131 71.4913141
754 0.7657023 76.5702346
755 0.6256341 62.5634087
756 0.2141573 21.4157252
757 0.0413393 4.1339305
758 0.7136250 71.3624960
759 0.6539310 65.3931004
760 0.7923811 79.2381124
761 0.4393673 43.9367336
762 0.6612049 66.1204939
763 0.4605414 46.0541386
764 0.6543847 65.4384667
765 0.6700612 67.0061168
766 0.6622161 66.2216127
767 0.1201993 12.0199343
768 0.4372077 43.7207651
769 0.7096609 70.9660887
770 0.0017517 0.1751669
771 0.1423880 14.2387995
772 0.0243497 2.4349684
773 0.2923641 29.2364081
774 0.4360757 43.6075731
775 0.4024918 40.2491823
776 0.0417578 4.1757759
777 0.0498983 4.9898272
778 0.7555309 75.5530900
779 0.0071212 0.7121244
780 0.1690063 16.9006318
781 0.0411840 4.1184004
782 0.4606185 46.0618458
783 0.7427114 74.2711450
784 0.6090365 60.9036532
785 0.4779479 47.7947876
786 0.5747285 57.4728454
787 0.7191315 71.9131485
788 0.0081431 0.8143090
789 0.0413902 4.1390174
790 0.0336405 3.3640482
791 0.0081669 0.8166857
792 0.0079745 0.7974451
793 0.0150339 1.5033891
794 0.6627478 66.2747751
795 0.1811977 18.1197663
796 0.1144861 11.4486116
797 0.7285250 72.8525008
798 0.6695442 66.9544180
799 0.7774384 77.7438400
800 0.5867304 58.6730418
801 0.6401162 64.0116228
802 0.5506348 55.0634849
803 0.5599968 55.9996850
804 0.5582615 55.8261475
805 0.6779270 67.7926962
806 0.7458949 74.5894938
807 0.5996043 59.9604267
808 0.4633017 46.3301692
809 0.4452128 44.5212827
810 0.5916701 59.1670072
811 0.7680847 76.8084658
812 0.6305112 63.0511181
813 0.6507459 65.0745942
814 0.6890656 68.9065644
815 0.5598757 55.9875719
816 0.6067827 60.6782682
817 0.7618261 76.1826139
818 0.6402691 64.0269146
819 0.7845464 78.4546372
820 0.7478458 74.7845844
821 0.6029951 60.2995119
822 0.7059040 70.5903967
823 0.5484710 54.8471000
824 0.2134107 21.3410660
825 0.1951407 19.5140691
826 0.6724627 67.2462735
827 0.7251474 72.5147431
828 0.5926741 59.2674082
829 0.6892051 68.9205147
830 0.7207573 72.0757328
831 0.7182121 71.8212132
832 0.7247775 72.4777532
833 0.2635111 26.3511104
834 0.6497313 64.9731275
835 0.6432216 64.3221619
836 0.4695685 46.9568504
837 0.3997215 39.9721545
838 0.6676752 66.7675222
839 0.6615329 66.1532882
840 0.7334046 73.3404628
841 0.0370295 3.7029474
842 0.5741133 57.4113291
843 0.5763403 57.6340336
844 0.7285411 72.8541115
845 0.0017360 0.1735978
846 0.0028661 0.2866131
847 0.1736796 17.3679555
848 0.0713510 7.1350957
849 0.0476882 4.7688196
850 0.2442258 24.4225771
851 0.4093489 40.9348946
852 0.3790762 37.9076211
853 0.5952616 59.5261597
854 0.6807751 68.0775064
855 0.0880757 8.8075673
856 0.0611032 6.1103178
857 0.6241467 62.4146700
858 0.7435328 74.3532773
859 0.0048221 0.4822054
860 0.0041779 0.4177907
861 0.3016050 30.1605049
862 0.0318906 3.1890588
863 0.4836012 48.3601169
864 0.4478030 44.7802989
865 0.4571252 45.7125194
866 0.6266151 62.6615138
867 0.6920106 69.2010593
868 0.5740709 57.4070944
869 0.4854253 48.5425293
870 0.6178118 61.7811825
871 0.6901013 69.0101261
872 0.0027874 0.2787429
873 0.0189687 1.8968724
874 0.0344826 3.4482639
875 0.0085654 0.8565366
876 0.0087594 0.8759408
877 0.7134107 71.3410731
878 0.6954438 69.5443782
879 0.7036993 70.3699342
880 0.1036503 10.3650287
881 0.6024182 60.2418174
882 0.0265527 2.6552729
883 0.0308419 3.0841939
884 0.5926558 59.2655843
885 0.6152683 61.5268324
886 0.6438128 64.3812788
887 0.4619180 46.1918007
888 0.5529789 55.2978879
889 0.5410580 54.1057965
890 0.7511699 75.1169885
891 0.6314530 63.1453003
892 0.4147654 41.4765359
893 0.0071324 0.7132364
894 0.0084865 0.8486539
895 0.7757557 77.5755741
896 0.5336955 53.3695524
897 0.6045588 60.4558797
898 0.7220361 72.2036098
899 0.6018589 60.1858921
900 0.0091156 0.9115604
901 0.5757924 57.5792401
902 0.5562997 55.6299720
903 0.6602173 66.0217326
904 0.7260884 72.6088414
905 0.7407887 74.0788694
906 0.6525974 65.2597415
907 0.2351286 23.5128600
908 0.1850370 18.5037045
909 0.6698844 66.9884438
910 0.6063208 60.6320845
911 0.6788018 67.8801790
912 0.6289924 62.8992351
913 0.6273730 62.7373018
914 0.6217316 62.1731620
915 0.7857718 78.5771818
916 0.3803508 38.0350842
917 0.6400379 64.0037868
918 0.6685645 66.8564485
919 0.4492720 44.9271958
920 0.5756261 57.5626128
921 0.6010155 60.1015533
922 0.5610502 56.1050241
923 0.7985888 79.8588767
924 0.1087395 10.8739478
925 0.4410988 44.1098758
926 0.7180468 71.8046763
927 0.6451064 64.5106438
928 0.0017351 0.1735119
929 0.0015585 0.1558505
930 0.1333689 13.3368904
931 0.0272195 2.7219488
932 0.7725791 77.2579084
933 0.4426760 44.2675967
934 0.6439590 64.3958969
935 0.0061188 0.6118752
936 0.0771225 7.7122501
937 0.0576974 5.7697432
938 0.7135463 71.3546315
939 0.0088378 0.8837818
940 0.0073452 0.7345238
941 0.1583809 15.8380857
942 0.1422805 14.2280520
943 0.0125977 1.2597743
944 0.6247363 62.4736267
945 0.5699835 56.9983541
946 0.7452892 74.5289204
947 0.6326725 63.2672487
948 0.7238778 72.3877762
949 0.5038563 50.3856260
950 0.5702350 57.0235016
951 0.7361197 73.6119697
952 0.7784025 77.8402503
953 0.0055628 0.5562762
954 0.6205464 62.0546420
955 0.7024907 70.2490681
956 0.4741562 47.4156234
957 0.4915059 49.1505931
958 0.1550527 15.5052727
959 0.6606871 66.0687135
960 0.5305289 53.0528863
961 0.3707336 37.0733577
962 0.5212248 52.1224797
963 0.0869716 8.6971589
964 0.6661697 66.6169732
965 0.4244689 42.4468937
966 0.1249300 12.4930031
967 0.1886079 18.8607923
968 0.6905086 69.0508613
969 0.1696326 16.9632611
970 0.2347847 23.4784713
971 0.5751581 57.5158062
972 0.6142535 61.4253470
973 0.5239999 52.3999877
974 0.4950075 49.5007549
975 0.6597114 65.9711436
976 0.5857331 58.5733072
977 0.7966926 79.6692623
978 0.6672786 66.7278553
979 0.7482086 74.8208615
980 0.5737703 57.3770311
981 0.1549826 15.4982560
982 0.6995725 69.9572463
983 0.2757444 27.5744391
984 0.5812454 58.1245367
985 0.8441471 84.4147083
986 0.5448058 54.4805781
987 0.7338398 73.3839775
988 0.5613904 56.1390423
989 0.7098719 70.9871919
990 0.0278827 2.7882675
991 0.5370518 53.7051831
992 0.4887298 48.8729803
993 0.7494984 74.9498406
994 0.4226139 42.2613897
995 0.3797306 37.9730565
996 0.2142524 21.4252423
997 0.1876790 18.7679038
998 0.1990525 19.9052549
999 0.1806834 18.0683361
1000 0.6137406 61.3740610
1001 0.5592364 55.9236442
1002 0.8217037 82.1703723
1003 0.4476300 44.7630036
1004 0.6161913 61.6191307
1005 0.5736113 57.3611284
1006 0.6295881 62.9588149
1007 0.5734784 57.3478380
1008 0.2858106 28.5810624
1009 0.2858106 28.5810624
1010 0.2624680 26.2467973
1011 0.5274688 52.7468849
1012 0.6660026 66.6002581
1013 0.8101068 81.0106763
1014 0.6373043 63.7304292
1015 0.7734276 77.3427613
1016 0.6457627 64.5762703
1017 0.4742414 47.4241421
1018 0.4485435 44.8543454
1019 0.4911639 49.1163927
1020 0.1809011 18.0901134
1021 0.2207601 22.0760077
1022 0.5503592 55.0359189
1023 0.7509601 75.0960128
1024 0.0855917 8.5591655
1025 0.1124620 11.2461950
1026 0.5342389 53.4238940
1027 0.7242970 72.4296973
1028 0.5680465 56.8046522
1029 0.5152781 51.5278064
1030 0.3413511 34.1351095
1031 0.6493653 64.9365283
1032 0.5755480 57.5547975
1033 0.5377472 53.7747231
1034 0.7120440 71.2044029
1035 0.1676558 16.7655841
1036 0.4640045 46.4004548
1037 0.6251752 62.5175241
1038 0.7704956 77.0495630
1039 0.6261333 62.6133274
1040 0.6980464 69.8046434
1041 0.6676895 66.7689479
1042 0.4021369 40.2136878
1043 0.6748173 67.4817341
1044 0.7440130 74.4012956
1045 0.3433567 34.3356689
1046 0.1828138 18.2813812
1047 0.5932279 59.3227929
1048 0.1735088 17.3508803
1049 0.6385488 63.8548752
1050 0.0873032 8.7303238
1051 0.5208626 52.0862564
1052 0.5698414 56.9841403
1053 0.7042545 70.4254490
1054 0.7401859 74.0185919
1055 0.7457381 74.5738051
1056 0.7678519 76.7851932
1057 0.5575662 55.7566197
1058 0.1481074 14.8107421
1059 0.1973903 19.7390321
1060 0.2987768 29.8776765
1061 0.6379986 63.7998583
1062 0.6177541 61.7754062
1063 0.4440102 44.4010241
1064 0.7230113 72.3011271
1065 0.5319951 53.1995091
1066 0.5958241 59.5824082
1067 0.5088318 50.8831768
1068 0.6696594 66.9659396
1069 0.5227548 52.2754822
1070 0.1661411 16.6141137
1071 0.2448549 24.4854877
1072 0.5610460 56.1045990
1073 0.8220211 82.2021096
1074 0.6704496 67.0449634
1075 0.5734784 57.3478380
1076 0.3036569 30.3656897
1077 0.1119180 11.1917969
1078 0.5538162 55.3816157
1079 0.1848321 18.4832140
1080 0.3588620 35.8862025
1081 0.8096300 80.9630040
1082 0.3544428 35.4442817
1083 0.4368204 43.6820422
1084 0.5237627 52.3762729
1085 0.1720313 17.2031334
1086 0.5610863 56.1086300
1087 0.7414897 74.1489688
1088 0.7379332 73.7933217
1089 0.6901630 69.0163013
1090 0.7107189 71.0718935
1091 0.5152999 51.5299890
1092 0.7821768 78.2176765
1093 0.7224728 72.2472773
1094 0.7024019 70.2401850
1095 0.2581375 25.8137545
1096 0.3913462 39.1346220
1097 0.8293822 82.9382198
1098 0.6961022 69.6102161
1099 0.6181906 61.8190610
1100 0.7289784 72.8978354
1101 0.3775179 37.7517900
1102 0.4240884 42.4088414
1103 0.5401989 54.0198874
1104 0.6317616 63.1761571
1105 0.4963100 49.6309978
1106 0.4326950 43.2694985
1107 0.5474256 54.7425563
1108 0.1325434 13.2543369
1109 0.6444189 64.4418884
1110 0.4036743 40.3674294
1111 0.1164088 11.6408846
1112 0.1967532 19.6753228
1113 0.6274994 62.7499357
1114 0.2321254 23.2125417
1115 0.1144580 11.4458047
1116 0.5166352 51.6635242
1117 0.6586039 65.8603851
1118 0.5600544 56.0054351
1119 0.5144061 51.4406067
1120 0.6610531 66.1053101
1121 0.7498037 74.9803722
1122 0.7696773 76.9677275
1123 0.7020121 70.2012106
1124 0.5644417 56.4441675
1125 0.6346934 63.4693439
1126 0.6056726 60.5672630
1127 0.2609482 26.0948230
1128 0.2765534 27.6553411
1129 0.5801855 58.0185539
1130 0.7058168 70.5816817
1131 0.5781652 57.8165240
1132 0.6060178 60.6017788
1133 0.5410752 54.1075163
1134 0.7469949 74.6994943
1135 0.5319951 53.1995091
1136 0.5688551 56.8855127
1137 0.5146313 51.4631304
1138 0.1872476 18.7247559
1139 0.6337386 63.3738589
1140 0.4010591 40.1059133
1141 0.2369931 23.6993108
1142 0.2051882 20.5188152
1143 0.2026121 20.2612143
1144 0.1560607 15.6060706
1145 0.5323571 53.2357078
1146 0.6743455 67.4345487
1147 0.8007512 80.0751198
1148 0.6991724 69.9172430
1149 0.6343062 63.4306188
1150 0.5065931 50.6593142
1151 0.5508741 55.0874133
1152 0.5734784 57.3478380
1153 0.2431648 24.3164771
1154 0.2634580 26.3458033
1155 0.1146556 11.4655578
1156 0.6677902 66.7790212
1157 0.2185585 21.8558488
1158 0.7980233 79.8023293
1159 0.5419888 54.1988759
1160 0.6794724 67.9472437
1161 0.6204501 62.0450092
1162 0.3786243 37.8624265
1163 0.4662679 46.6267868
1164 0.5324940 53.2493968
1165 0.2618456 26.1845619
1166 0.2047178 20.4717796
1167 0.5661104 56.6110445
1168 0.7750599 77.5059891
1169 0.1086000 10.8599968
1170 0.1357219 13.5721912
1171 0.5275158 52.7515799
1172 0.6855799 68.5579937
1173 0.5180695 51.8069512
1174 0.5566192 55.6619181
1175 0.8275527 82.7552708
1176 0.6422664 64.2266409
1177 0.5767241 57.6724146
1178 0.5955254 59.5525404
1179 0.6121506 61.2150631
1180 0.1905128 19.0512796
1181 0.4554872 45.5487151
1182 0.6252887 62.5288717
1183 0.7988481 79.8848125
1184 0.6330949 63.3094937
1185 0.5386364 53.8636403
1186 0.6678639 66.7863869
1187 0.6691691 66.9169148
1188 0.4531015 45.3101505
1189 0.3542611 35.4261116
1190 0.7529784 75.2978410
1191 0.7457381 74.5738051
1192 0.6963083 69.6308288
1193 0.7077768 70.7776829
1194 0.4073508 40.7350830
1195 0.7030693 70.3069257
1196 0.5511910 55.1190997
1197 0.6900836 69.0083627
1198 0.8049876 80.4987601
1199 0.6345607 63.4560749
1200 0.6016632 60.1663198
1201 0.4214641 42.1464131
1202 0.3475588 34.7558764
1203 0.7204851 72.0485081
1204 0.7061254 70.6125439
1205 0.6719489 67.1948922
1206 0.5925991 59.2599149
1207 0.5608664 56.0866383
1208 0.6320411 63.2041078
1209 0.5069504 50.6950437
1210 0.7372625 73.7262487
1211 0.4586009 45.8600917
1212 0.3475588 34.7558764
1213 0.3410022 34.1002171
1214 0.4886112 48.8611203
1215 0.3163391 31.6339134
1216 0.6413309 64.1330941
1217 0.7678320 76.7832003
1218 0.4513767 45.1376657
1219 0.5760388 57.6038831
1220 0.5497514 54.9751374
1221 0.6877159 68.7715896
1222 0.6264229 62.6422915
1223 0.4712877 47.1287709
1224 0.7440178 74.4017778
1225 0.6710302 67.1030166
1226 0.4804506 48.0450593
1227 0.6936493 69.3649283
1228 0.4799606 47.9960557
1229 0.7295821 72.9582076
1230 0.7688107 76.8810748
1231 0.6994427 69.9442693
1232 0.7184174 71.8417377
1233 0.5261331 52.6133092
1234 0.5526587 55.2658690
1235 0.3627964 36.2796378
1236 0.8219802 82.1980235
1237 0.7532168 75.3216764
1238 0.6394413 63.9441304
1239 0.6198222 61.9822247
1240 0.4636182 46.3618204
1241 0.4363012 43.6301160
1242 0.7268337 72.6833731
1243 0.6205127 62.0512656
1244 0.6200760 62.0075992
1245 0.6137678 61.3767791
1246 0.6994149 69.9414891
1247 0.3710370 37.1037043
1248 0.7683628 76.8362766
1249 0.3086090 30.8608983
1250 0.5564388 55.6438834
1251 0.6388819 63.8881914
1252 0.6011072 60.1107231
1253 0.5720404 57.2040450
1254 0.7467700 74.6769977
1255 0.6516268 65.1626812
1256 0.6666888 66.6688783
1257 0.7679775 76.7977519
1258 0.4506189 45.0618853
1259 0.3702531 37.0253055
1260 0.7440889 74.4088888
1261 0.7453830 74.5383001
1262 0.7601213 76.0121339
1263 0.6174758 61.7475831
1264 0.5402315 54.0231508
1265 0.4876354 48.7635427
1266 0.5857628 58.5762807
1267 0.5014742 50.1474168
1268 0.6679964 66.7996361
1269 0.6525707 65.2570735
1270 0.7382250 73.8225021
1271 0.6392006 63.9200571
1272 0.5872645 58.7264468
1273 0.6181797 61.8179672
1274 0.3475588 34.7558764
1275 0.3475588 34.7558764
1276 0.6817065 68.1706537
1277 0.7848758 78.4875774
1278 0.6433038 64.3303824
1279 0.4131395 41.3139537
1280 0.4766804 47.6680402
1281 0.6381470 63.8146975
1282 0.6210806 62.1080637
1283 0.6534779 65.3477939
1284 0.6760896 67.6089551
1285 0.5487065 54.8706498
1286 0.6684076 66.8407627
1287 0.7820332 78.2033155
1288 0.5541087 55.4108680
1289 0.7524738 75.2473800
1290 0.7321074 73.2107379
1291 0.6320580 63.2057954
1292 0.7004048 70.0404836
1293 0.4823097 48.2309721
1294 0.4517772 45.1777153
1295 0.3952967 39.5296715
1296 0.5411254 54.1125359
1297 0.4578830 45.7883005
1298 0.6493808 64.9380828
1299 0.7919025 79.1902542
1300 0.5780697 57.8069720
1301 0.4527096 45.2709575
1302 0.6290200 62.9020019
1303 0.5893543 58.9354332
1304 0.5619614 56.1961351
1305 0.5650799 56.5079852
1306 0.6840903 68.4090321
1307 0.6858579 68.5857861
1308 0.6636075 66.3607523
1309 0.5445072 54.4507202
1310 0.6749040 67.4903978
1311 0.6751833 67.5183313
1312 0.4978144 49.7814432
1313 0.6133850 61.3385026
1314 0.5218184 52.1818360
1315 0.5144250 51.4424957
1316 0.7140444 71.4044350
1317 0.7033797 70.3379746
1318 0.7345256 73.4525557
1319 0.7416496 74.1649635
1320 0.6150124 61.5012445
1321 0.5154600 51.5459989
1322 0.7328547 73.2854688
1323 0.8216174 82.1617404
1324 0.8333095 83.3309530
1325 0.7307752 73.0775152
1326 0.6163349 61.6334862
1327 0.6366562 63.6656177
1328 0.4500847 45.0084686
1329 0.4574062 45.7406160
1330 0.6476075 64.7607469
1331 0.6120477 61.2047748
1332 0.5794976 57.9497558
1333 0.5760388 57.6038831
1334 0.6832886 68.3288614
1335 0.6705528 67.0552840
1336 0.5745826 57.4582573
1337 0.4230753 42.3075285
1338 0.7439638 74.3963825
1339 0.5529991 55.2999140
1340 0.5242280 52.4228008
1341 0.5607309 56.0730873
1342 0.7291109 72.9110871
1343 0.6384006 63.8400585
1344 0.5737450 57.3744953
1345 0.5655255 56.5525536
1346 0.6776510 67.7650973
1347 0.7317124 73.1712445
1348 0.5880751 58.8075124
1349 0.5354294 53.5429417
1350 0.6608151 66.0815088
1351 0.6806354 68.0635424
1352 0.6726348 67.2634801
1353 0.7091629 70.9162928
1354 0.5455292 54.5529183
1355 0.5989521 59.8952068
1356 0.4694946 46.9494646
1357 0.4618547 46.1854690
1358 0.6011264 60.1126448
1359 0.6314266 63.1426636
1360 0.5478119 54.7811921
1361 0.7800565 78.0056454
1362 0.6350716 63.5071574
1363 0.6525840 65.2584037
1364 0.6886539 68.8653862
1365 0.6103564 61.0356381
1366 0.6048646 60.4864576
1367 0.6033426 60.3342612
1368 0.7321116 73.2111593
1369 0.8090352 80.9035222
1370 0.3995443 39.9544330
1371 0.4299491 42.9949080
1372 0.7580096 75.8009552
1373 0.6567346 65.6734565
1374 0.7095631 70.9563148
1375 0.6867239 68.6723885
1376 0.6346210 63.4621003
1377 0.7618258 76.1825789
1378 0.5836876 58.3687645
1379 0.4863260 48.6325988
1380 0.5719161 57.1916124
1381 0.5863264 58.6326438
1382 0.3475588 34.7558764
1383 0.3475588 34.7558764
1384 0.6838312 68.3831164
1385 0.6758657 67.5865713
1386 0.6685547 66.8554680
1387 0.7726019 77.2601866
1388 0.7020524 70.2052403
1389 0.8723607 87.2360736
1390 0.5993206 59.9320648
1391 0.5939974 59.3997445
1392 0.3167235 31.6723452
1393 0.3065296 30.6529558
1394 0.5977289 59.7728935
1395 0.6863614 68.6361375
1396 0.6600241 66.0024080
1397 0.5193106 51.9310628
1398 0.6211512 62.1151230
1399 0.6309958 63.0995769
1400 0.5265735 52.6573503
1401 0.4600980 46.0098010
1402 0.4077222 40.7722213
1403 0.3475588 34.7558764
1404 0.3338127 33.3812729
1405 0.4378650 43.7865016
1406 0.4644914 46.4491407
1407 0.4800080 48.0008042
1408 0.5863949 58.6394933
1409 0.2828301 28.2830138
1410 0.5204361 52.0436093
1411 0.5491408 54.9140764
1412 0.4402332 44.0233179
1413 0.8391800 83.9179958
1414 0.6103062 61.0306217
1415 0.4434498 44.3449835
1416 0.5760388 57.6038831
1417 0.5799350 57.9934951
1418 0.5286453 52.8645286
1419 0.5707369 57.0736940
1420 0.7419271 74.1927114
1421 0.7193134 71.9313397
1422 0.6929512 69.2951162
1423 0.7006158 70.0615850
1424 0.4427851 44.2785091
1425 0.4438477 44.3847684
1426 0.7184108 71.8410829
1427 0.7449837 74.4983747
1428 0.7103843 71.0384299
1429 0.6981183 69.8118256
1430 0.5840178 58.4017776
1431 0.4753960 47.5395965
1432 0.7128947 71.2894693
1433 0.6774148 67.7414775
1434 0.6205397 62.0539686
1435 0.5901308 59.0130790
1436 0.7529853 75.2985346
1437 0.7144487 71.4448689
1438 0.7172897 71.7289727
1439 0.7133393 71.3339295
1440 0.7200395 72.0039455
1441 0.6405483 64.0548310
1442 0.7802957 78.0295666
1443 0.7621976 76.2197555
1444 0.5384136 53.8413626
1445 0.6084332 60.8433229
1446 0.5435514 54.3551426
1447 0.5397383 53.9738345
1448 0.7406867 74.0686726
1449 0.3753286 37.5328587
1450 0.8008112 80.0811189
1451 0.8007796 80.0779580
1452 0.8096835 80.9683477
1453 0.7110867 71.1086745
1454 0.5771564 57.7156447
1455 0.5771626 57.7162574
1456 0.6385301 63.8530105
1457 0.5256639 52.5663890
1458 0.4502762 45.0276213
1459 0.4737114 47.3711362
1460 0.5911564 59.1156396
1461 0.4829102 48.2910192
1462 0.7070397 70.7039656
1463 0.6653173 66.5317272
1464 0.5939933 59.3993274
1465 0.6068536 60.6853579
1466 0.6200158 62.0015828
1467 0.6479222 64.7922208
1468 0.6366408 63.6640796
1469 0.6339147 63.3914739
1470 0.5630228 56.3022794
1471 0.6339813 63.3981257
1472 0.4271182 42.7118158
1473 0.4811885 48.1188543
1474 0.7832908 78.3290752
1475 0.6468850 64.6884971
1476 0.4444633 44.4463260
1477 0.3893579 38.9357877
1478 0.4373671 43.7367091
1479 0.5545472 55.4547154
1480 0.7848051 78.4805135
1481 0.6992212 69.9221222
1482 0.5683102 56.8310183
1483 0.6271922 62.7192191
1484 0.5795068 57.9506780
1485 0.5684457 56.8445719
1486 0.4591237 45.9123664
1487 0.7736439 77.3643879
1488 0.6149273 61.4927325
1489 0.6330536 63.3053610
1490 0.5792931 57.9293076
1491 0.3999262 39.9926198
1492 0.6708962 67.0896191
1493 0.7162285 71.6228464
1494 0.6347735 63.4773490
1495 0.6896935 68.9693517
1496 0.1585979 15.8597932
1497 0.2426365 24.2636516
1498 0.6667119 66.6711946
1499 0.7202971 72.0297110
1500 0.7862222 78.6222202
1501 0.7549180 75.4918007
1502 0.7786847 77.8684650
1503 0.7555932 75.5593164
1504 0.4904442 49.0444232
1505 0.4732078 47.3207760
1506 0.5756460 57.5645952
1507 0.6570557 65.7055730
1508 0.8435043 84.3504283
1509 0.8249221 82.4922150
1510 0.7090446 70.9044620
1511 0.7664622 76.6462150
1512 0.5262097 52.6209690
1513 0.3220429 32.2042937
1514 0.7354270 73.5427018
1515 0.7440179 74.4017931
1516 0.5940078 59.4007800
1517 0.7085785 70.8578510
1518 0.5813071 58.1307131
1519 0.6474855 64.7485508
1520 0.5434082 54.3408203
1521 0.6054146 60.5414566
1522 0.6292256 62.9225577
1523 0.8295869 82.9586928
1524 0.6744118 67.4411838
1525 0.5984749 59.8474910
1526 0.4606885 46.0688497
1527 0.4590854 45.9085417
1528 0.5061724 50.6172405
1529 0.5367232 53.6723219
1530 0.8543672 85.4367158
1531 0.8462650 84.6264968
1532 0.6913774 69.1377356
1533 0.7591751 75.9175055
1534 0.5414005 54.1400521
1535 0.3839413 38.3941252
1536 0.6693435 66.9343469
1537 0.5403985 54.0398486
1538 0.5139344 51.3934442
1539 0.5662972 56.6297151
1540 0.6756676 67.5667612
1541 0.7497381 74.9738075
1542 0.6809815 68.0981501
1543 0.6834172 68.3417220
1544 0.6058600 60.5860045
1545 0.4546479 45.4647896
1546 0.7661018 76.6101805
1547 0.7435961 74.3596116
1548 0.5896881 58.9688134
1549 0.6604919 66.0491938
1550 0.6684013 66.8401278
1551 0.4968478 49.6847760
1552 0.5654390 56.5438986
1553 0.4892896 48.9289556
1554 0.7111773 71.1177350
1555 0.7855772 78.5577203
1556 0.7648403 76.4840263
1557 0.6121485 61.2148517
1558 0.6806735 68.0673528
1559 0.4954816 49.5481637
1560 0.8752497 87.5249696
1561 0.8398486 83.9848621
1562 0.5144748 51.4474753
1563 0.6067580 60.6757997
1564 0.6699718 66.9971764
1565 0.5534597 55.3459687
1566 0.5896165 58.9616546
1567 0.5890660 58.9066047
1568 0.7381908 73.8190838
1569 0.7232887 72.3288743
1570 0.8513325 85.1332539
1571 0.7795582 77.9558198
1572 0.7909871 79.0987108
1573 0.7638650 76.3864990
1574 0.7678485 76.7848545
1575 0.7145228 71.4522775
1576 0.7603677 76.0367687
1577 0.7637874 76.3787377
1578 0.4881771 48.8177147
1579 0.4080169 40.8016918
1580 0.6861943 68.6194299
1581 0.7354099 73.5409890
1582 0.6455605 64.5560497
1583 0.6570501 65.7050111
1584 0.6085522 60.8552241
1585 0.6925344 69.2534441
1586 0.6120807 61.2080728
1587 0.5523153 55.2315252
1588 0.7049698 70.4969767
1589 0.7251424 72.5142443
1590 0.6224570 62.2456977
1591 0.6024770 60.2477006
1592 0.6859665 68.5966501
1593 0.6384732 63.8473223
1594 0.5349091 53.4909106
1595 0.5380309 53.8030862
1596 0.8576007 85.7600743
1597 0.8538293 85.3829268
1598 0.7364674 73.6467380
1599 0.7137496 71.3749631
1600 0.6111299 61.1129937
1601 0.5659056 56.5905574
1602 0.5588792 55.8879201
1603 0.5473067 54.7306689
1604 0.5240759 52.4075925
1605 0.5299343 52.9934294
1606 0.6800413 68.0041265
1607 0.7298647 72.9864716
1608 0.6061360 60.6136026
1609 0.6871987 68.7198701
1610 0.6024300 60.2429995
1611 0.5419076 54.1907640
1612 0.7039456 70.3945628
1613 0.6878278 68.7827781
1614 0.5772637 57.7263700
1615 0.6302319 63.0231885
1616 0.6525596 65.2559580
1617 0.5575259 55.7525862
1618 0.5721306 57.2130577
1619 0.4725561 47.2556144
1620 0.7747529 77.4752853
1621 0.7979923 79.7992294
1622 0.7904245 79.0424541
1623 0.6697056 66.9705649
1624 0.5571924 55.7192352
1625 0.5841610 58.4160977
1626 0.9196191 91.9619106
1627 0.8703945 87.0394486
1628 0.4898761 48.9876083
1629 0.5256696 52.5669600
1630 0.6903564 69.0356426
1631 0.5146522 51.4652154
1632 0.5725345 57.2534504
1633 0.5667457 56.6745744
1634 0.7405160 74.0516000
1635 0.7157401 71.5740108
1636 0.8384412 83.8441200
1637 0.8376005 83.7600502
1638 0.7923070 79.2307012
1639 0.8387813 83.8781279
1640 0.6849792 68.4979244
1641 0.7432413 74.3241328
1642 0.7661760 76.6176020
1643 0.7579389 75.7938927
1644 0.5582933 55.8293290
1645 0.5512412 55.1241165
1646 0.4768655 47.6865506
1647 0.5253346 52.5334608
1648 0.7212864 72.1286365
1649 0.7291773 72.9177339
1650 0.7265404 72.6540435
1651 0.7816822 78.1682182
1652 0.8375534 83.7553353
1653 0.8411063 84.1106345
1654 0.7555547 75.5554692
1655 0.8011935 80.1193452
1656 0.5864336 58.6433553
1657 0.6532590 65.3259008
1658 0.6600660 66.0065950
1659 0.6629517 66.2951652
1660 0.4985569 49.8556935
1661 0.4857290 48.5729038
1662 0.5504245 55.0424550
1663 0.5358991 53.5899103
1664 0.7228988 72.2898832
1665 0.5934828 59.3482840
1666 0.8082039 80.8203937
1667 0.8147028 81.4702759
1668 0.5931978 59.3197827
1669 0.4738386 47.3838584
1670 0.6269707 62.6970669
1671 0.6006211 60.0621057
1672 0.4614444 46.1444375
1673 0.4591723 45.9172295
1674 0.4816171 48.1617091
1675 0.5443467 54.4346705
1676 0.5238999 52.3899871
1677 0.5523284 55.2328367
1678 0.7202874 72.0287403
1679 0.7127191 71.2719086
1680 0.8340938 83.4093850
1681 0.8452895 84.5289522
1682 0.8282279 82.8227877
1683 0.8093840 80.9384041
1684 0.7084742 70.8474231
1685 0.7317190 73.1719014
1686 0.7757926 77.5792578
1687 0.6787927 67.8792727
1688 0.5436202 54.3620178
1689 0.5122517 51.2251651
1690 0.5333865 53.3386530
1691 0.4220851 42.2085125
1692 0.6559741 65.5974124
1693 0.6535970 65.3596956
1694 0.6299248 62.9924798
1695 0.5650951 56.5095088
1696 0.4132573 41.3257253
1697 0.4072683 40.7268321
1698 0.5455739 54.5573911
1699 0.5978746 59.7874577
1700 0.6556596 65.5659555
1701 0.6950753 69.5075312
1702 0.7408545 74.0854482
1703 0.7812799 78.1279897
1704 0.8410576 84.1057605
1705 0.7959698 79.5969817
1706 0.6778438 67.7843766
1707 0.7196381 71.9638137
1708 0.6322262 63.2226174
1709 0.6182891 61.8289062
1710 0.5617127 56.1712733
1711 0.4055448 40.5544772
1712 0.7598308 75.9830821
1713 0.7570076 75.7007560
1714 0.7152619 71.5261896
1715 0.6879943 68.7994299
1716 0.6220015 62.2001540
1717 0.5780194 57.8019416
1718 0.6941350 69.4135007
1719 0.6710064 67.1006369
1720 0.6799852 67.9985161
1721 0.6830390 68.3039036
1722 0.6355277 63.5527744
1723 0.5811562 58.1156162
1724 0.5440928 54.4092797
1725 0.5922054 59.2205376
1726 0.4422965 44.2296494
1727 0.5300154 53.0015415
1728 0.7193423 71.9342326
1729 0.7476753 74.7675251
1730 0.7752757 77.5275717
1731 0.7668160 76.6815952
1732 0.7713230 77.1322999
1733 0.7898748 78.9874834
1734 0.7353084 73.5308411
1735 0.7789155 77.8915473
1736 0.5762375 57.6237539
1737 0.6771435 67.7143505
1738 0.5496779 54.9677944
1739 0.5463499 54.6349897
1740 0.8856298 88.5629821
1741 0.8347834 83.4783355
1742 0.6703417 67.0341743
1743 0.7466858 74.6685767
1744 0.5005132 50.0513200
1745 0.4921203 49.2120267
1746 0.5185029 51.8502855
1747 0.5118873 51.1887263
1748 0.6946659 69.4665936
1749 0.7097727 70.9772653
1750 0.5300602 53.0060166
1751 0.6371216 63.7121569
1752 0.5899637 58.9963736
1753 0.6060959 60.6095870
1754 0.5961057 59.6105699
1755 0.5953684 59.5368420
1756 0.7623560 76.2356042
1757 0.7644903 76.4490304
1758 0.6871955 68.7195528
1759 0.7014726 70.1472606
1760 0.7724374 77.2437421
1761 0.8497343 84.9734292
1762 0.6655028 66.5502753
1763 0.5444411 54.4441131
1764 0.7965216 79.6521630
1765 0.8112764 81.1276432
1766 0.7264058 72.6405786
1767 0.8373443 83.7344330
1768 0.7062329 70.6232916
1769 0.7339164 73.3916376
1770 0.7124479 71.2447906
1771 0.7608319 76.0831867
1772 0.7635242 76.3524192
1773 0.7719169 77.1916856
1774 0.7478822 74.7882170
1775 0.7458544 74.5854366
1776 0.7966754 79.6675428
1777 0.8329601 83.2960095
1778 0.7519738 75.1973788
1779 0.7133986 71.3398556
1780 0.8846145 88.4614490
1781 0.8383046 83.8304606
1782 0.8219069 82.1906929
1783 0.8376134 83.7613372
1784 0.8266574 82.6657442
1785 0.6935256 69.3525561
1786 0.8674535 86.7453540
1787 0.8648594 86.4859360
1788 0.7007357 70.0735652
1789 0.8362819 83.6281946
1790 0.7962407 79.6240684
1791 0.8157202 81.5720186
1792 0.7926413 79.2641253
1793 0.8551771 85.5177106
1794 0.8882487 88.8248718
1795 0.8820033 88.2003310
1796 0.8385471 83.8547061
1797 0.8708717 87.0871747
1798 0.7371941 73.7194095
1799 0.8218966 82.1896572
1800 0.7985324 79.8532404
1801 0.8036748 80.3674791
1802 0.7412658 74.1265751
1803 0.7106440 71.0644014
1804 0.8915420 89.1542046
1805 0.8846914 88.4691414
1806 0.8102412 81.0241245
1807 0.8178385 81.7838485
1808 0.8018094 80.1809442
1809 0.6957896 69.5789640
1810 0.8702541 87.0254137
1811 0.8785755 87.8575512
1812 0.6984923 69.8492271
1813 0.7692650 76.9265014
1814 0.8318442 83.1844201
1815 0.8123319 81.2331871
1816 0.7912376 79.1237595
1817 0.7930289 79.3028886
1818 0.8811388 88.1138808
1819 0.8777870 87.7786979
1820 0.8480779 84.8077913
1821 0.8438361 84.3836068
1822 0.7972951 79.7295126
1823 0.7811581 78.1158145
1824 0.8279607 82.7960654
1825 0.7944625 79.4462538
1826 0.8625813 86.2581265
1827 0.8208887 82.0888712
1828 0.7765044 77.6504434
1829 0.7681315 76.8131520
1830 0.7253443 72.5344330
1831 0.7937804 79.3780406
1832 0.8859761 88.5976132
1833 0.8775792 87.7579209
1834 0.8691174 86.9117436
1835 0.8554483 85.5448259
1836 0.8465634 84.6563363
1837 0.8042225 80.4222514
1838 0.8586821 85.8682137
1839 0.8452692 84.5269231
1840 0.7847894 78.4789436
1841 0.7382547 73.8254747
1842 0.7427233 74.2723325
1843 0.7267124 72.6712399
1844 0.8694480 86.9448011
1845 0.8725372 87.2537234
1846 0.8790247 87.9024735
1847 0.8581538 85.8153750
1848 0.7437044 74.3704428
1849 0.7308998 73.0899833
1850 0.8396441 83.9644142
1851 0.8594965 85.9496472
1852 0.7748617 77.4861707
1853 0.8173303 81.7330272
1854 0.7845349 78.4534941
1855 0.7575617 75.7561676
1856 0.7626402 76.2640165
1857 0.7781900 77.8189967
1858 0.8705226 87.0522597
1859 0.8738647 87.3864732
1860 0.8764917 87.6491674
1861 0.8843747 88.4374665
1862 0.8809957 88.0995748
1863 0.8882331 88.8233128
1864 0.8473305 84.7330529
1865 0.8437116 84.3711575
1866 0.8646268 86.4626789
1867 0.8437901 84.3790098
1868 0.7266368 72.6636841
1869 0.7264817 72.6481703
1870 0.7847039 78.4703899
1871 0.7866324 78.6632400
1872 0.7988312 79.8831181
1873 0.8176293 81.7629287
1874 0.7340333 73.4033274
1875 0.7095180 70.9517990
1876 0.8929374 89.2937431
1877 0.8883107 88.8310712
1878 0.7958870 79.5887006
1879 0.8008611 80.0861116
1880 0.7752566 77.5256559
1881 0.7830518 78.3051782
1882 0.8739329 87.3932904
1883 0.8788371 87.8837101
1884 0.7027864 70.2786442
1885 0.7415637 74.1563742
1886 0.8259119 82.5911894
1887 0.8344155 83.4415461
1888 0.7902916 79.0291551
1889 0.7934219 79.3421903
1890 0.8756924 87.5692428
1891 0.8746447 87.4644686
1892 0.8633161 86.3316131
1893 0.8663519 86.6351852
1894 0.7570562 75.7056214
1895 0.7729747 77.2974683
1896 0.7979698 79.7969751
1897 0.8024145 80.2414452
1898 0.8265430 82.6543002
1899 0.8243630 82.4363015
1900 0.7972615 79.7261516
1901 0.7348094 73.4809400
1902 0.7233569 72.3356866
1903 0.7123814 71.2381399
1904 0.8851190 88.5118980
1905 0.8868582 88.6858225
1906 0.8489173 84.8917262
1907 0.8272984 82.7298370
1908 0.8191493 81.9149290
1909 0.8224010 82.2400981
1910 0.8444644 84.4464402
1911 0.8448141 84.4814053
1912 0.7971817 79.7181747
1913 0.8249857 82.4985687
1914 0.7198909 71.9890884
1915 0.7560465 75.6046533
1916 0.8689642 86.8964185
1917 0.8708849 87.0884917
1918 0.8709717 87.0971696
1919 0.8658384 86.5838379
1920 0.7088083 70.8808274
1921 0.7300769 73.0076936
1922 0.8356794 83.5679446
1923 0.8468295 84.6829516
1924 0.7375011 73.7501125
1925 0.7872022 78.7202170
1926 0.7813855 78.1385514
1927 0.7543535 75.4353483
1928 0.7749494 77.4949378
1929 0.7837129 78.3712936
1930 0.8694506 86.9450599
1931 0.8673847 86.7384705
1932 0.8853322 88.5332226
1933 0.8866699 88.6669871
1934 0.8813172 88.1317164
1935 0.8841565 88.4156487
1936 0.8465471 84.6547146
1937 0.8429365 84.2936540
1938 0.8660243 86.6024289
1939 0.8692840 86.9283956
1940 0.7346886 73.4688561
1941 0.7354591 73.5459132
1942 0.8177983 81.7798298
1943 0.7951450 79.5145013
1944 0.7990162 79.9016195
1945 0.7954398 79.5439809
1946 0.8106324 81.0632442
1947 0.7966119 79.6611871
1948 0.7749770 77.4977042
1949 0.7608330 76.0832953
1950 0.7217117 72.1711716
1951 0.7116624 71.1662383
1952 0.8852428 88.5242805
1953 0.8873024 88.7302385
1954 0.8822187 88.2218716
1955 0.8795771 87.9577149
1956 0.8075913 80.7591311
1957 0.8136368 81.3636831
1958 0.7956538 79.5653784
1959 0.8165322 81.6532218
1960 0.8042843 80.4284306
1961 0.8149461 81.4946116
1962 0.6977912 69.7791180
1963 0.7019645 70.1964534
1964 0.8752568 87.5256781
1965 0.8679948 86.7994756
1966 0.8773184 87.7318406
1967 0.8776655 87.7665542
1968 0.6958990 69.5898986
1969 0.7584640 75.8464007
1970 0.7769687 77.6968721
1971 0.7491992 74.9199177
1972 0.7722925 77.2292492
1973 0.7686899 76.8689862
1974 0.7277308 72.7730820
1975 0.7978404 79.7840406
1976 0.7745070 77.4507012
1977 0.7850056 78.5005605
1978 0.7816788 78.1678846
1979 0.7841407 78.4140725
1980 0.8818554 88.1855407
1981 0.8824678 88.2467842
1982 0.8782012 87.8201187
1983 0.8791827 87.9182695
1984 0.8566609 85.6660869
1985 0.8617684 86.1768376
1986 0.8355519 83.5551918
1987 0.8347871 83.4787145
1988 0.7938816 79.3881585
1989 0.7912250 79.1225037
1990 0.7876284 78.7628386
1991 0.7792801 77.9280057
1992 0.8036519 80.3651863
1993 0.8261256 82.6125612
1994 0.7933506 79.3350618
1995 0.8267883 82.6788291
1996 0.8513827 85.1382740
1997 0.8480176 84.8017629
1998 0.8307679 83.0767922
1999 0.8293068 82.9306766
2000 0.8241486 82.4148616
2001 0.7807754 78.0775381
2002 0.7767469 77.6746861
2003 0.7410551 74.1055142
2004 0.7312928 73.1292845
2005 0.7693980 76.9398012
2006 0.8117184 81.1718420
2007 0.7957168 79.5716767
2008 0.8866262 88.6626248
2009 0.8871342 88.7134173
2010 0.8820639 88.2063885
2011 0.8774943 87.7494265
2012 0.8691849 86.9184909
2013 0.8555591 85.5559128
2014 0.8627070 86.2706982
2015 0.8601523 86.0152295
2016 0.8524661 85.2466086
2017 0.8592337 85.9233732
2018 0.8146659 81.4665879
2019 0.7974826 79.7482630
2020 0.8599496 85.9949640
2021 0.8431479 84.3147897
2022 0.8511391 85.1139137
2023 0.8548430 85.4842952
2024 0.7503982 75.0398197
2025 0.7551437 75.5143660
2026 0.7780318 77.8031811
2027 0.7670002 76.7000175
2028 0.7467153 74.6715258
2029 0.7360228 73.6022841
2030 0.7191727 71.9172684
2031 0.7278124 72.7812365
2032 0.8627510 86.2751004
2033 0.8690252 86.9025172
2034 0.8784922 87.8492222
2035 0.8766366 87.6636576
2036 0.8761732 87.6173199
2037 0.8802008 88.0200802
2038 0.8656374 86.5637372
2039 0.8625386 86.2538632
2040 0.7313200 73.1320016
2041 0.7283624 72.8362448
2042 0.7293907 72.9390662
2043 0.7153166 71.5316650
2044 0.8413340 84.1333954
2045 0.8523180 85.2317953
2046 0.8535384 85.3538433
2047 0.8718416 87.1841613
2048 0.8038950 80.3894997
2049 0.7537996 75.3799646
2050 0.8119550 81.1955000
2051 0.7408476 74.0847646
2052 0.7744486 77.4448583
2053 0.7437637 74.3763716
2054 0.7417618 74.1761786
2055 0.7660602 76.6060186
2056 0.7747749 77.4774892
2057 0.7733874 77.3387376
2058 0.7719511 77.1951059
2059 0.7752984 77.5298442
2060 0.8646085 86.4608489
2061 0.8868204 88.6820355
2062 0.8857528 88.5752803
2063 0.8811031 88.1103127
2064 0.8768978 87.6897821
2065 0.8771713 87.7171309
2066 0.8826142 88.2614201
2067 0.8857133 88.5713306
2068 0.8804579 88.0457914
2069 0.8805129 88.0512949
2070 0.8865513 88.6551323
2071 0.8863854 88.6385374
2072 0.8440604 84.4060406
2073 0.8462981 84.6298058
2074 0.8593491 85.9349137
2075 0.8536960 85.3695990
2076 0.8455539 84.5553929
2077 0.8627978 86.2797775
2078 0.8621051 86.2105142
2079 0.8618003 86.1800328
2080 0.7296193 72.9619345
2081 0.7285948 72.8594820
2082 0.7225571 72.2557105
2083 0.7231812 72.3181184
2084 0.7774568 77.7456826
2085 0.7710593 77.1059306
2086 0.8128637 81.2863660
2087 0.8109414 81.0941369

4.3 Results

4.3.1 5. Conclusion

So far, we have conducted a comprehensive exploration and preparation of our dataset, focusing on understanding the influence of lifestyle factors on obesity within a sample from Mexico, Peru, and Colombia. The dataset, which was pre-processed with SMOTE to address class imbalance, has provided us with balanced obesity categories, facilitating an in-depth analysis of key variables such as eating habits, physical activity, and alcohol consumption. Through correlation analysis, we identified the variables with the strongest associations to obesity levels, helping to guide our selection of factors for inclusion in the next modeling phase. Additionally, we have thoroughly cleaned and structured the data, renaming variables for clarity, formatting categorical variables, and removing duplicates to ensure a solid foundation for robust modeling.

The next steps involve constructing regression models to analyze the relationships and predictive power of these selected factors on obesity levels. Specifically, we will develop two versions of the model—one that includes extreme values and one that excludes them—to evaluate the impact of outliers on model accuracy and stability. Key metrics such as R², P-values, and VIF will be used to confirm the reliability of the model and address potential multicollinearity issues. Following this, we will build and fine-tune a predictive model using metrics like Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R² to validate and enhance performance.

These efforts will culminate in a final report that, while primarily an exercise and not applicable in real-world contexts, highlights our findings and offers insights into the most influential lifestyle factors affecting obesity. This analysis aims to provide actionable recommendations within a simulated scenario, illustrating how data-driven insights could support public health strategies focused on obesity reduction.

4.4 Next Steps

Outline the next steps planned for completing the project, such as refining analyses, adding new methods, or addressing outstanding data issues.

4.5 Final Thoughts

Briefly reflect on any challenges or limitations encountered so far and how these might be addressed in the final report.